Use of multiple recombination sites with unique specificity in combinational cloning

ABSTRACT

The present invention provides compositions and methods for recombinational cloning. The compositions include vectors having multiple recombination sites with unique specificity. The methods permit the simultaneous cloning of two or more different nucleic acid molecules. In some embodiments the molecules are fused together while in other embodiments the molecules are inserted into distinct sites in a vector. The invention also generally provides for linking or joining through recombination a number of molecules and/or compounds (e.g., chemical compounds, drugs, proteins or peptides, lipids, nucleic acids, carbohydrates, etc.) which may be the same or different. Such molecules and/or compounds or combinations of such molecules and/or compounds can also be bound through recombination to various structures or supports according to the invention.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. application Ser. No.10/640,422, filed Aug. 14, 2003, which claims the benefit of the filingdate of U.S. Provisional Application No. 60/402,920, filed Aug. 14,2002. U.S. application Ser. No. 10/640,422 also is acontinuation-in-part of, and claims the benefit under 35 U.S.C. §120 of,U.S. application Ser. No. 09/732,914, filed Dec. 11, 2000, which claimsthe benefit of the filing dates of U.S. Provisional Application Nos.60/169,983, filed Dec. 10, 1999, and 60/188,020, filed Mar. 9, 2000. Thedisclosures of all of these referenced applications are incorporatedherein by reference in their entireties.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to the fields of biotechnology andmolecular biology. In particular, the present invention relates tojoining multiple nucleic acid molecules containing recombination sites,preferably using recombination sites having a unique specificity. Thepresent invention also relates to cloning such joined nucleic acidmolecules using recombinational cloning methods. The invention alsorelates to joining multiple peptides, and combinations of peptides andnucleic acid molecules through the use of recombination sites. Othermolecules and compounds or combinations of molecules and compounds mayalso be joined through recombination sites according to the invention.Such peptides, nucleic acids and other molecules and/or compounds (orcombinations thereof) may also be joined or bound through recombinationto one or a number of supports or structures in accordance with theinvention.

2. Related Art

Site-Specific Recombinases

Site-specific recombinases are proteins that are present in manyorganisms (e.g., viruses and bacteria) and have been characterized ashaving both endonuclease and ligase properties. These recombinases(along with associated proteins in some cases) recognize specificsequences of bases in a nucleic acid molecule and exchange the nucleicacid segments flanking those sequences. The recombinases and associatedproteins are collectively referred to as “recombination proteins” (see,e.g., Landy, A., Current Opinion in Biotechnology 3:699-707 (1993)).

Numerous recombination systems from various organisms have beendescribed. See, e.g., Hoess, et al., Nucleic Acids Research 14(6):2287(1986); Abremski, et al., J. Biol. Chem. 261(1):391 (1986); Campbell, J.Bacteriol. 174(23):7495 (1992); Qian, et al., J. Biol. Chem.267(11):7794 (1992); Araki, et al., J. Mol. Biol. 225(1):25 (1992);Maeser and Kahnmann, Mol. Gen. Genet. 230:170-176) (1991); Esposito, etal., Nucl. Acids Res. 25(18):3605 (1997). Many of these belong to theintegrase family of recombinases (Argos, et al., EMBO J. 5:433-440(1986); Voziyanov, et al., Nucl. Acids Res. 27:930 (1999)). Perhaps thebest studied of these are the Integrase/att system from bacteriophage λ(Landy, A. Current Opinions in Genetics and Devel. 3:699-707 (1993)),the Cre/loxP system from bacteriophage P1 (Hoess and Abremski (1990) InNucleic Acids and Molecular Biology, vol. 4. Eds.: Eckstein and Lilley,Berlin-Heidelberg: Springer-Verlag; pp. 90-109), and the FLP/FRT systemfrom the Saccharomyces cerevisiae 2μ circle plasmid (Broach, et al.,Cell 29:227-234 (1982)).

Transposons

Transposons are mobile genetic elements. Transposons are structurallyvariable, being described as simple or compound, but typically encode atransposition catalyzing enzyme, termed a transposase, flanked by DNAsequences organized in inverted orientations. For a more thoroughdiscussion of the characteristics of transposons, one may consult MobileGenetic Elements, D. J. Sherratt, Ed., Oxford University Press (1995)and Mobile DNA, D. E. Berg and M. M. Howe, Eds., American Society forMicrobiology (1989), Washington, D.C. both of which are specificallyincorporated herein by reference.

Transposons have been used to insert DNA into target DNA. As a generalrule, the insertion of transposons into target DNA is a random event.One exception to this rule is the insertion of transposon Tn7.Transposon Tn7 can integrate itself into a specific site in the E. coligenome as one part of its life cycle (Stellwagen, A. E., and Craig, N.L. Trends in Biochemical Sciences 23, 486-490, 1998 specificallyincorporated herein by reference). This site specific insertion has beenused in vivo to manipulate the baculovirus genome (Lucklow et al., J.Virol. 67:4566-4579 (1993) specifically incorporated herein byreference). The site specificity of Tn7 is atypical of transposableelements whose hallmark is movement to random positions in acceptor DNAmolecules. For the purposes of this application, transposition will beused to refer to random or quasi-random movement, unless otherwisespecified, whereas recombination will be used to refer to site specificrecombination events. Thus, the site specific insertion of Tn7 into theattTn 7 site would be referred to as a recombination event while therandom insertion of Tn7 would be referred to as a transposition event.

York, et al. (Nucleic Acids Research, 26(8):1927-1933, (1998)) disclosean in vitro method for the generation of nested deletions based upon anintramolecular transposition within a plasmid using Tn5. A vectorcontaining a kanamycin resistance gene flanked by two 19 base pair Tn5transposase recognition sequences and a target DNA sequence wasincubated in vitro in the presence of purified transposase protein.Under the conditions of low DNA concentration employed, theintramolecular transposition reaction was favored and was successfullyused to generate a set of nested deletions in the target DNA. Theauthors suggested that this system might be used to generate C-terminaltruncations in a protein encoded by the target DNA by the inclusion ofstop signals in all three reading frames adjacent to the recognitionsequences. In addition, the authors suggested that the inclusion of aHis tag and kinase region might be used to generate N-terminal deletionproteins for further analysis.

Devine, et al., (Nucleic Acids Research, 22:3765-3772 (1994) and U.S.Pat. Nos. 5,677,170 and 5,843,772, all of which are specificallyincorporated herein by reference) disclose the construction ofartificial transposons for the insertion of DNA segments into recipientDNA molecules in vitro. The system makes use of the insertion-catalyzingenzyme of yeast TY1 virus-like particles as a source of transposaseactivity. The DNA segment of interest is cloned, using standard methods,between the ends of the transposon-like element TY1. In the presence ofthe TY1 insertion-catalyzing enzyme, the resulting element integratesrandomly into a second target DNA molecule.

Another class of mobile genetic elements are integrons. Integronsgenerally consist of a 5′- and a 3′-conserved sequence flanking avariable sequence. Typically, the 5′-conserved sequence contains thecoding information for an integrase protein. The integrase protein maycatalyze site-specific recombination at a variety of recombination sitesincluding att1, attC as well as other types of sites (see Francia etal., J. Bacteriology 181(21):6844-6849, 1999, and references citedtherein).

Recombination Sites

Whether the reactions discussed above are termed recombination,transposition or integration and are catalyzed by a recombinase orintegrase, they share the key feature of specific recognition sequences,often termed “recombination sites,” on the nucleic acid moleculesparticipating in the reactions. These recombination sites are sectionsor segments of nucleic acid on the participating nucleic acid moleculesthat are recognized and bound by the recombination proteins during theinitial stages of integration or recombination. For example, therecombination site for Cre recombinase is loxP which is a 34 base pairsequence comprised of two 13 base pair inverted repeats (serving as therecombinase binding sites) flanking an 8 base pair core sequence. (SeeFIG. 1 of Sauer, B., Curr. Opin. Biotech. 5:521-527 (1994).) Otherexamples of recognition sequences include the attB, attP, attL, and attRsequences which are recognized by the recombination protein λ Int. attBis an approximately 25 base pair sequence containing two 9 base paircore-type Int binding sites and a 7 base pair overlap region, while attPis an approximately 240 base pair sequence containing core-type Intbinding sites and arm-type Int binding sites as well as sites forauxiliary proteins integration host factor (IHF), FIS and excisionase (Xis). (See Landy, Curr. Opin. Biotech. 3:699-707 (1993).)

Stop Codons and Suppressor tRNAs

Three codons are used by both eukaryotes and prokaryotes to signal theend of gene. When transcribed into mRNA, the codons have the followingsequences: UAG (amber), UGA (opal) and UAA (ochre). Under mostcircumstances, the cell does not contain any tRNA molecules thatrecognize these codons. Thus, when a ribosome translating an mRNAreaches one of these codons, the ribosome stalls and falls of the RNA,terminating translation of the mRNA. The release of the ribosome fromthe mRNA is mediated by specific factors (see S. Mottagui-Tabar, NAR26(11), 2789, 1998). A gene with an in-frame stop codon (TAA, TAG, orTGA) will ordinarily encode a protein with a native carboxy terminus.However, suppressor tRNAs, can result in the insertion of amino acidsand continuation of translation past stop codons.

Mutant tRNA molecules that recognize what are ordinarily stop codonssuppress the termination of translation of an mRNA molecule and aretermed suppressor tRNAs. A number of such suppressor tRNAs have beenfound. Examples include, but are not limited to, the supE, supP, supD,supF and supZ suppressors which suppress the termination of translationof the amber stop codon, supB, glT, supL, supN, supC and supMsuppressors which suppress the function of the ochre stop codon andglyT, trpT and Su-9 which suppress the function of the opal stop codon.In general, suppressor tRNAs contain one or more mutations in theanti-codon loop of the tRNA that allows the tRNA to base pair with acodon that ordinarily functions as a stop codon. The mutant tRNA ischarged with its cognate amino acid residue and the cognate amino acidresidue is inserted into the translating polypeptide when the stop codonis encountered. For a more detailed discussion of suppressor tRNAs, thereader may consult Eggertsson, et al., (1988) Microbiological Review52(3):354-374, and Engleerg-Kukla, et al. (1996) in Escherichia coli andSalmonella Cellular and Molecular Biology, Chapter 60, pps 909-921,Neidhardt, et al. eds., ASM Press, Washington, D.C.

Mutations which enhance the efficiency of termination suppressors, i.e.,increase the read through of the stop codon, have been identified. Theseinclude, but are not limited to, mutations in the uar gene (also knownas the prfA gene), mutations in the ups gene, mutations in the sueA,sueB and sueC genes, mutations in the rpsD (ramA) and rpsE (spcA) genesand mutations in the rplL gene.

Under ordinary circumstances, host cells would not be expected to behealthy if suppression of stop codons is too efficient. This is becauseof the thousands or tens of thousands of genes in a genome, asignificant fraction will naturally have one of the three stop codons;complete read-through of these would result in a large number ofaberrant proteins containing additional amino acids at their carboxytermini. If some level of suppressing tRNA is present, there is a racebetween the incorporation of the amino acid and the release of theribosome. Higher levels of tRNA may lead to more read-through althoughother factors, such as the codon context, can influence the efficiencyof suppression.

Organisms ordinarily have multiple genes for tRNAs. Combined with theredundancy of the genetic code (multiple codons for many of the aminoacids), mutation of one tRNA gene to a suppressor tRNA status does notlead to high levels of suppression. The TAA stop codon is the strongest,and most difficult to suppress. The TGA is the weakest, and naturally(in E. coli) leaks to the extent of 3%. The TAG (amber) codon isrelatively tight, with a read-through of .about.1% without suppression.In addition, the amber codon can be suppressed with efficiencies on theorder of 50% with naturally occurring suppressor mutants.

Suppression has been studied for decades in bacteria and bacteriophages.In addition, suppression is known in yeast, flies, plants and othereukaryotic cells including mammalian cells. For example, Capone, et al.(Molecular and Cellular Biology 6(9):3059-3067, 1986) demonstrated thatsuppressor tRNAs derived from mammalian tRNAs could be used to suppressa stop codon in mammalian cells. A copy of the E. coli chloramphenicolacetyltransferase (cat) gene having a stop codon in place of the codonfor serine 27 was transfected into mammalian cells along with a geneencoding a human serine tRNA which had been mutated to form an amber,ochre, or opal suppressor derivative of the gene. Successful expressionof the cat gene was observed. An inducible mammalian amber suppressorhas been used to suppress a mutation in the replicase gene of poliovirus and cell lines expressing the suppressor were successfully used topropagate the mutated virus (Sedivy, et al., (1987) Cell 50: 379-389).The context effects on the efficiency of suppression of stop codons bysuppressor tRNAs has been shown to be different in mammalian cells ascompared to E. coli (Phillips-Jones, et al., (1995) Molecular andCellular Biology 15(12): 6593-6600, Martin, et al., (1993) BiochemicalSociety Transactions 21:846-851) Since some human diseases are caused bynonsense mutations in essential genes, the potential of suppression forgene therapy has long been recognized (see Temple, et al. (1982) Nature296(5857):537-40). The suppression of single and double nonsensemutations introduced into the diphtheria toxin A-gene has been used asthe basis of a binary system for toxin gene therapy (Robinson, et al.,(1995) Human Gene Therapy 6:137-143).

Conventional Nucleic Acid Cloning

The cloning of nucleic acid segments currently occurs as a daily routinein many research labs and as a prerequisite step in many geneticanalyses. The purpose of these clonings is various, however, two generalpurposes can be considered: (1) the initial cloning of nucleic acid fromlarge DNA or RNA segments (chromosomes, YACs, PCR fragments, mRNA,etc.), done in a relative handful of known vectors such as pUC, pGem,pBlueScript, and (2) the subcloning of these nucleic acid segments intospecialized vectors for functional analysis. A great deal of time andeffort is expended both in the transfer of nucleic acid segments fromthe initial cloning vectors to the more specialized vectors. Thistransfer is called subcloning.

The basic methods for cloning have been known for many years and havechanged little during that time. A typical cloning protocol is asfollows:

(1) digest the nucleic acid of interest with one or two restrictionenzymes;

(2) gel purify the nucleic acid segment of interest when known;

(3) prepare the vector by cutting with appropriate restriction enzymes,treating with alkaline phosphatase, gel purify etc., as appropriate;

(4) ligate the nucleic acid segment to the vector, with appropriatecontrols to eliminate background of uncut and self-ligated vector;

(5) introduce the resulting vector into an E. coli host cell;

(6) pick selected colonies and grow small cultures overnight;

(7) make nucleic acid minipreps; and

(8) analyze the isolated plasmid on agarose gels (often after diagnosticrestriction enzyme digestion) or by PCR.

The specialized vectors used for subcloning nucleic acid segments arefunctionally diverse. These include but are not limited to: vectors forexpressing nucleic acid molecules in various organisms; for regulatingnucleic acid molecule expression; for providing tags to aid in proteinpurification or to allow tracking of proteins in cells; for modifyingthe cloned nucleic acid segment (e.g., generating deletions); for thesynthesis of probes (e.g., riboprobes); for the preparation of templatesfor nucleic acid sequencing; for the identification of protein codingregions; for the fusion of various protein-coding regions; to providelarge amounts of the nucleic acid of interest, etc. It is common that aparticular investigation will involve subcloning the nucleic acidsegment of interest into several different specialized vectors.

As known in the art, simple subclonings can be done in one day (e.g.,the nucleic acid segment is not large and the restriction sites arecompatible with those of the subcloning vector). However, many othersubclonings can take several weeks, especially those involving unknownsequences, long fragments, toxic genes, unsuitable placement ofrestriction sites, high backgrounds, impure enzymes, etc. One of themost tedious and time consuming type of subcloning involves thesequential addition of several nucleic acid segments to a vector inorder to construct a desired clone. One example of this type of cloningis in the construction of gene targeting vectors. Gene targeting vectorstypically include two nucleic acid segments, each identical to a portionof the target gene, flanking a selectable marker. In order to constructsuch a vector, it may be necessary to clone each segment sequentially,i.e., first one gene fragment is inserted into the vector, then theselectable marker and then the second fragment of the target gene. Thismay require a number of digestion, purification, ligation and isolationsteps for each fragment cloned. Subcloning nucleic acid fragments isthus often viewed as a chore to be done as few times as possible.

Several methods for facilitating the cloning of nucleic acid segmentshave been described, e.g., as in the following references.

Ferguson, J., et al., Gene 16:191 (1981), disclose a family of vectorsfor subcloning fragments of yeast nucleic acids. The vectors encodekanamycin resistance. Clones of longer yeast nucleic acid segments canbe partially digested and ligated into the subcloning vectors. If theoriginal cloning vector conveys resistance to ampicillin, nopurification is necessary prior to transformation, since the selectionwill be for kanamycin.

Hashimoto-Gotoh, T., et al., Gene 41:125 (1986), disclose a subcloningvector with unique cloning sites within a streptomycin sensitivity gene;in a streptomycin-resistant host, only plasmids with inserts ordeletions in the dominant sensitivity gene will survive streptomycinselection.

Notwithstanding the improvements provided by these methods, traditionalsubclonings using restriction and ligase enzymes are time consuming andrelatively unreliable. Considerable labor is expended, and if two ormore days later the desired subclone can not be found among thecandidate plasmids, the entire process must then be repeated withalternative conditions attempted.

Recombinational Cloning

Cloning systems that utilize recombination at defined recombinationsites have been previously described in the related applications listedabove, and in U.S. application Ser. No. 09/177,387, filed Oct. 23, 1998;U.S. application Ser. No. 09/517,466, filed Mar. 2, 2000; and U.S. Pat.Nos. 5,888,732 and 6,143,557, all of which are specifically incorporatedherein by reference. In brief, the GATEWAY™ Cloning System, described inthis application and the applications referred to in the relatedapplications section, utilizes vectors that contain at least onerecombination site to clone desired nucleic acid molecules in vivo or invitro. More specifically, the system utilizes vectors that contain atleast two different site-specific recombination sites based on thebacteriophage lambda system (e.g., att1 and att2) that are mutated fromthe wild-type (att0) sites. Each mutated site has a unique specificityfor its cognate partner att site (i.e., its binding partnerrecombination site) of the same type (for example attB 1 with attP1, orattL1 with attR1) and will not cross-react with recombination sites ofthe other mutant type or with the wild-type att0 site. Different sitespecificities allow directional cloning or linkage of desired moleculesthus providing desired orientation of the cloned molecules. Nucleic acidfragments flanked by recombination sites are cloned and subcloned usingthe GATEWAY™ system by replacing a selectable marker (for example, ccdB)flanked by att sites on the recipient plasmid molecule, sometimes termedthe Destination Vector. Desired clones are then selected bytransformation of a ccdB sensitive host strain 2 and positive selectionfor a marker on the recipient molecule. Similar strategies for negativeselection (e.g., use of toxic genes) can be used in other organisms suchas thymidine kinase (TK) in mammals and insects.

Mutating specific residues in the core region of the att site cangenerate a large number of different att sites. As with the att1 andatt2 sites utilized in GATEWAY™, each additional mutation potentiallycreates a novel att site with unique specificity that will recombineonly with its cognate partner att site bearing the same mutation andwill not cross-react with any other mutant or wild-type att site. Novelmutated att sites (e.g., attB1-10, attP1-10, attR1-10 and attL1-10) aredescribed in previous patent application Ser. No. 09/517,466, filed Mar.2, 2000, which is specifically incorporated herein by reference. Otherrecombination sites having unique specificity (i.e., a first site willrecombine with its corresponding site and will not recombine or notsubstantially recombine with a second site having a differentspecificity) may be used to practice the present invention. Examples ofsuitable recombination sites include, but are not limited to, loxPsites; loxP site mutants, variants or derivatives such as loxP511 (seeU.S. Pat. No. 5,851,808); frt sites; frt site mutants, variants orderivatives; dif sites; dif site mutants, variants or derivatives; psisites; psi site mutants, variants or derivatives; cer sites; and cersite mutants, variants or derivatives. The present invention providesnovel methods using such recombination sites to join or link multiplenucleic acid molecules or segments and more specifically to clone suchmultiple segments (e.g., two, three, four, five, seven, ten, twelve,fifteen, twenty, thirty, fifty, seventy-five, one hundred, two hundred,etc.) into one or more vectors (e.g., two, three, four, five, seven,ten, twelve, etc.) containing one or more recombination sites (e.g.,two, three, four, five, seven, ten, twelve, fifteen, twenty, thirty,fifty, seventy-five, one hundred, two hundred, etc.), such as anyGATEWAY™ Vector including Destination Vectors.

BRIEF SUMMARY OF THE INVENTION

The present invention generally provides materials and methods forjoining or combining two or more (e.g., two, three, four, five, seven,ten, twelve, fifteen, twenty, thirty, fifty, seventy-five, one hundred,two hundred, etc.) segments or molecules of nucleic acid by therecombination reaction between recombination sites, at least one ofwhich is present on each molecule or segment. Such recombinationreactions to join multiple nucleic acid molecules according to theinvention may be conducted in vivo (e.g., within a cell, tissue, organor organism) or in vitro (e.g., cell-free systems). Accordingly, theinvention relates to methods for creating novel or unique combinationsof nucleic acid molecules and to the nucleic acid molecules created bysuch methods. The invention also relates to host and host cellscomprising the nucleic acid molecules of the invention. The inventionalso relates to kits for carrying out the methods of the invention, andto compositions for carrying out the methods of the invention as well ascompositions made while carrying out the methods of the invention.

The nucleic acid molecules created by the methods of the invention maybe used for any purpose known to those skilled in the art. For example,the nucleic acid molecules of the invention may be used to expressproteins or peptides encoded by the nucleic acid molecules and may beused to create novel fusion proteins by expressing different sequenceslinked by the methods of the invention. Such expression can beaccomplished in a cell or by using well known in vitroexpression/transcription systems. In one aspect, at least one (andpreferably two or more) of the nucleic acid molecules or segments to bejoined by the methods of the invention comprise at least tworecombination sites, although each molecule may comprise multiplerecombination sites (e.g., two, three, four, five, seven, ten, twelve,fifteen, twenty, thirty, fifty, etc.). Such recombination sites (whichmay be the same or different) may be located at various positions ineach nucleic acid molecule or segment and the nucleic acid used in theinvention may have various sizes and be in different forms includingcircular, supercoiled, linear, and the like. The nucleic acid moleculesused in the invention may also comprise one or more vectors or one ormore sequences allowing the molecule to function as a vector in a hostcell (such as an origin of replication). The nucleic acid molecules ofthe invention may also comprise non-coding segments (e.g., intronic,untranslated, or other segments) that serve a structural or othernon-expressive functions.

In a preferred aspect, the nucleic acid molecules or segments for use inthe invention are linear molecules having at least one recombinationsite at or near at least one termini of the molecule and preferablycomprise at least one recombination site at or near both termini of themolecule. In another preferred aspect, when multiple recombination sitesare located on a nucleic acid molecule of interest, such sites do notsubstantially recombine or do not recombine with each other on thatmolecule. In this embodiment, the corresponding binding partnerrecombination sites preferably are located on one or more other nucleicacid molecules to be linked or joined by the methods of the invention.For instance, a first nucleic acid molecule used in the invention maycomprise at least a first and second recombination site and a secondnucleic acid molecule may comprise at least a third and fourthrecombination site, wherein the first and second sites do not recombinewith each other and the third and fourth sites do not recombine witheach other, although the first and third and/or the second and fourthsites may recombine.

The nucleic acid molecules to be joined by the methods of the invention(i.e., the “starting molecules”) are used to produce one or more (e.g.,two, three, four, five, seven, ten, twelve, fifteen, twenty, thirty,fifty, seventy-five, one hundred, two hundred, etc.) hybrid molecules(e.g., the “product nucleic acid molecules”) containing all or a portionof the starting molecules. The starting molecules can be any nucleicacid molecule derived from any source or produced by any method. Suchmolecules may be derived from natural sources (such as cells (e.g.,prokaryotic cells such as bacterial cells, eukaryotic cells such asfungal cells (e.g., yeast cells), plant cells, animals cells (e.g.,mammalian cells such as human cells), etc.), viruses, tissues, organsfrom any animal or non-animal source, and organisms) or may benon-natural (e.g., derivative nucleic acids) or synthetically derived.Such molecules may also include prokaryotic and eukaryotic vectors,plasmids, integration sequences (e.g., transposons), phage or viralvectors, phagemids, cosmids, and the like. The segments or molecules foruse in the invention may be produced by any means known to those skilledin the art including, but not limited to, amplification such as by PCR,isolation from natural sources, chemical synthesis, shearing orrestriction digest of larger nucleic acid molecules (such as genomic orcDNA), transcription, reverse transcription and the like, andrecombination sites may be added to such molecules by any means known tothose skilled in the art including ligation of adapters containingrecombination sites, attachment with topoisomerases of adapterscontaining recombination sites, attachment with topoisomerases ofadapter primers containing recombination sites, amplification or nucleicacid synthesis using primers containing recombination sites, insertionor integration of nucleic acid molecules (e.g., transponsons orintegration sequences) containing recombination sites etc. In apreferred aspect, the nucleic acid molecules used in the invention arepopulations of molecules such as nucleic acid libraries or cDNAlibraries.

Recombination sites for use in the invention may be any recognitionsequence on a nucleic acid molecule which participates in arecombination reaction mediated or catalyzed by one or morerecombination proteins. In those embodiments of the present inventionutilizing more than one (e.g., two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.) recombination sites, suchrecombination sites may be the same or different and may recombine witheach other or may not recombine or not substantially recombine with eachother. Recombination sites contemplated by the invention also includemutants, derivatives or variants of wild-type or naturally occurringrecombination sites. Preferred recombination site modifications includethose that enhance recombination, such enhancements being selected fromthe group consisting of substantially (i) favoring integrativerecombination; (ii) favoring excisive recombination; (iii) relieving therequirement for host factors; (iv) increasing the efficiency ofco-integrate or product formation; and (v) increasing the specificity ofco-integrate or product formation.

Preferred modifications to the recombination sites include those thatenhance recombination specificity, remove one or more stop codons,and/or avoid hair-pin formation. Desired modifications can also be madeto the recombination sites to include desired amino acid changes to thetranscription or translation product (e.g., mRNA or protein) whentranslation or transcription occurs across the modified recombinationsite. Preferred recombination sites used in accordance with theinvention include att sites, frt sites, dif sites, psi sites, cer sites,and lox sites or mutants, derivatives and variants thereof (orcombinations thereof). Recombination sites contemplated by the inventionalso include portions of such recombination sites. Depending on therecombination site specificity used, the invention allows directionallinking of nucleic acid molecules to provide desired orientations of thelinked molecules or non-directional linking to produce randomorientations of the linked molecules.

In specific embodiments, the recombination sites which recombine witheach other in compositions and used in methods of the invention compriseatt sites having identical seven base pair overlap regions. In specificembodiments of the invention, the first three nucleotides of these sevenbase pair overlap regions comprise nucleotide sequences selected fromthe group consisting of AAA, AAC, AAG, AAT, ACA, ACC, ACG, ACT, AGA,AGC, AGG, AGT, ATA, ATC, ATG; ATT, CAA, CAC, CAG, CAT, CCA, CCC, CCG,CCT, CGA, CGC, CGG, CGT, CTA, CTC, CTG CTT, GAA, GAC, GAG, GAT, GCA,GCC, GCG, GCT, GGA, GGC, GGG, GGT, GTA, GTC, GTG, GTT, TAA, TAC, TAG,TAT, TCA, TCC, TCG, TCT, TGA, TGC, TGG, TGT, TTA, TTC, TTG, and TTT.

Each starting nucleic acid molecule may comprise, in addition to one ormore recombination sites (e.g., two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.), a variety of sequences(or combinations thereof) including, but not limited to sequencessuitable for use as primer sites (e.g., sequences which a primer such asa sequencing primer or amplification primer may hybridize to initiatenucleic acid synthesis, amplification or sequencing), transcription ortranslation signals or regulatory sequences such as promoters orenhancers, ribosomal binding sites, Kozak sequences, start codons,transcription and/or translation termination signals such as stop codons(which may be optimally suppressed by one or more suppressor tRNAmolecules), origins of replication, selectable markers, and genes orportions of genes which may be used to create protein fusion (e.g.,N-terminal or carboxy terminal) such as glutathione S-transferase (GST),β-glucuronidase (GUS), histidine tags (HIS6), green fluorescent protein(GFP), yellow fluorescent protein (YFP), cyan fluorescent protein (CFP),open reading frame (ORF) sequences, and any other sequence of interestwhich may be desired or used in various molecular biology techniquesincluding sequences for use in homologous recombination (e.g., for usein gene targeting).

In one aspect, the invention provides methods for producing populationsof hybrid nucleic acid molecules comprising (a) mixing at least a firstpopulation of nucleic acid molecules comprising one or morerecombination sites with at least one target nucleic acid moleculecomprising one or more recombination sites; and (b) causing some or allof the nucleic acid molecules of the at least first population torecombine with all or some of the target nucleic acid molecules, therebyforming the populations of hybrid nucleic acid molecules. In certainspecific embodiments of the above methods, the recombination is causedby mixing the first population of nucleic acid molecules and the targetnucleic acid molecule with one or more recombination proteins underconditions which favor the recombination to produce hybrid nucleic acidmolecules. In other specific embodiments, methods of the inventionfurther comprise mixing the hybrid nucleic acid molecules with at leasta second population of nucleic acid molecules comprising one or morerecombination sites to produce a second population of product nucleicacid molecules. Alternatively, the first population, second populationand target nucleic acid molecules may be mixed together to form a hybridpopulation through recombination. In additional specific embodiments,methods of the invention further comprise selecting for the populationsof hybrid nucleic acid molecules generated by the methods describedabove. In yet additional specific embodiments, methods of the inventionfurther comprise selecting for the population of hybrid nucleic acidmolecules, against the first population of nucleic acid molecules,against the target nucleic acid molecules, and/or against the secondpopulation of nucleic acid molecules.

In related embodiments, the invention provides methods for recombining afirst nucleic acid segment containing a first recombination site, asecond nucleic acid segment containing a second and third recombinationsite, and a third nucleic acid segment containing a fourth recombinationsite, wherein the first, second, or third nucleic acid segments may beidentical nucleic acid segments or populations of nucleic acidmolecules, such that recombination generates a linear or closed, circleproduct comprising the first, second and third nucleic acid segments.Further, members of the recombination products may be amplified usingoligonucleotides which either contain or do not contain recombinationsites and are homologous or degenerate to the first or third nucleicacid segments. Thus, for example, by performing amplification withprimers specific for the first and third nucleic acid segments, aproduct comprising the first-second-third hybrid molecules can beamplified, where other undesired molecules (e.g., products comprisingthe first-second hybrid molecules) are not amplified. In this way,amplification can be used to select for desired products and againstundesired products. Such amplification can be designed to select for anydesired products or intermediates of a recombination reaction. Forexample, four different molecules (e.g., A, B, C, and D) can be joinedand various intermediate products can be selected for (e.g., A-B-C, orA-B) using primers designed to amplify the desired products (e.g.,primers corresponding to molecules A and C, when A-B-C is amplified andA and B when A-B is amplified). The resulting amplified products maythen be cloned. In related embodiments, the process described above canbe performed using two or more (e.g., two, three, four, five, six,seven, eight, nine, ten, eleven, twelve, thirteen, fifteen, etc.)nucleic acid segments.

In another aspect, the invention provides methods of producingpopulations of hybrid nucleic acid molecules comprising (a) mixing atleast a first population of nucleic acid molecules comprising one ormore recombination sites with at least a second population of nucleicacid molecules comprising one or more recombination sites; and (b)causing some or all of the nucleic acid molecules of the at least firstpopulation to recombine with all or some nucleic acid molecules of theat least second population, thereby forming one or more populations ofhybrid nucleic acid molecules. In certain specific embodiments of theabove methods, recombination is caused by mixing the first population ofnucleic acid molecules and the second population of nucleic acidmolecules with one or more recombination proteins under conditions whichfavor their recombination. In other specific embodiments, methods of theinvention further comprise mixing the first and second populations ofnucleic acid molecules with at least a third population of nucleic acidmolecules comprising one or more recombination sites. In additionalother specific embodiments, methods of the invention further compriseselecting for the population of hybrid nucleic acid molecules. In yetother specific embodiments, methods of the invention further compriseselecting for the population of hybrid nucleic acid molecules andagainst the first, second, and/or third populations of nucleic acidmolecules. In further specific embodiments, methods of the inventionfurther comprise selecting for or against cointegrate molecules and/orbyproduct molecules.

The invention further includes populations of hybrid nucleic acidmolecules produced by the above methods and populations of recombinanthost cells comprising the above populations of hybrid nucleic acidmolecules.

In certain embodiments, the recombination proteins used in the practiceof the invention comprise one or more proteins selected from the groupconsisting of Cre, Int, IHF, X is, Flp, Fis, Hin, Gin, Cin, Tn3resolvase, TndX, XerC, XerD, and ΦC31. In specific embodiments, therecombination sites comprise one or more recombination sites selectedfrom the group consisting of lox sites; psi sites; dif sites; cer sites;frt sites; att sites; and mutants, variants, and derivatives of theserecombination sites which retain the ability to undergo recombination.

In a specific aspect, the invention allows controlled expression offusion proteins by suppression of one or more stop codons. According tothe invention, one or more starting molecules (e.g., one, two, three,four, five, seven, ten, twelve, etc.) joined by the invention maycomprise one or more stop codons which may be suppressed to allowexpression from a first starting molecule through the next joinedstarting molecule. For example, a first-second-third starting moleculejoined by the invention (when each of such first and second moleculescontains a stop codon) can express a tripartite fusion protein encodedby the joined molecules by suppressing each of the stop codons.Moreover, the invention allows selective or controlled fusion proteinexpression by varying the suppression of selected stop codons. Thus, bysuppressing the stop codon between the first and second molecules butnot between the second and third molecules of the first-second-thirdmolecule, a fusion protein encoded by the first and second molecule maybe produced rather than the tripartite fusion. Thus, use of differentstop codons and variable control of suppression allows production ofvarious fusion proteins or portions thereof encoded by all or differentportions of the joined starting nucleic acid molecules of interest. Inone aspect, the stop codons may be included anywhere within the startingnucleic acid molecule or within a recombination site contained by thestarting molecule. Preferably, such stop codons are located at or nearthe termini of the starting molecule of interest, although such stopcodons may be included internally within the molecule. In anotheraspect, one or more of the starting nucleic acid molecules may comprisethe coding sequence of all or a portion of the target gene or openreading frame of interest wherein the coding sequence is followed by astop codon. The stop codon may then be followed by a recombination siteallowing joining of a second starting molecule. In some embodiments ofthis type, the stop codon may be optionally suppressed by a suppressortRNA molecule. The genes coding for the suppressor tRNA molecule may beprovided on the same vector comprising the target gene of interest, on adifferent vector, or in the chromosome of the host cell into which thevector comprising the coding sequence is inserted. In some embodiments,more than one copy (e.g., two, three, four, five, seven, ten, twelve,fifteen, twenty, thirty, fifty, etc. copies) of the suppressor tRNA maybe provided. In some embodiments, the transcription of the suppressortRNA may be under the control of a regulatable (e.g., inducible orrepressible) promoter.

Thus, in one aspect, the invention relates to a method of expressing oneor more fusion proteins (e.g., one, two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.) comprising:

(a) obtaining at least a first nucleic acid molecule comprising at leastone recombination site and at least one stop codon (preferably therecombination site and/or stop codon are located at or near a terminusor termini of said first nucleic acid molecule), and a second nucleicacid molecule comprising at least one recombination site (which ispreferably located at or near a terminus or termini of said secondnucleic acid molecule);

(b) causing said first and second nucleic acid molecules to recombinethrough recombination of said recombination sites, thereby producing athird nucleic acid molecule comprising said at least one stop codon andall or a portion of said first and second molecules; and

(c) expressing one or more peptides or proteins (e.g., two, three, four,five, seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.) encodedby said third molecule while suppressing said at least one stop codon.

Further, recombination sites described herein (e.g., recombination siteshaving various recombination specificities) may contain stop codons inone, two or all three forward or reverse reading frames. Suchtermination codons may be suppressed as described above. Further, inappropriate instances, such recombination sites may be designed so as toeliminate stop codons in one, two and/or all three forward and/orreverse reading frames.

In another aspect, the invention provides methods of synthesizingproteins comprising (a) providing at least a first nucleic acid moleculecomprising a coding sequence followed by a stop codon; (b) providing atleast a second nucleic acid molecule comprising a coding sequence,optionally, followed by a stop codon; (c) causing recombination suchthat the nucleic acid molecules are joined; (d) inserting said joinednucleic acid molecules into a vector to produce modified vectors withthe two coding sequences connected in frame; (e) transforming host cellswhich express suppressor tRNAs with the modified vectors; and (f)causing expression of the two coding sequences such that fusion proteinsencoded by at least a portion of both of the coding sequences areproduced, wherein the nucleic acid molecules of (a) and (b) are eachflanked by at least one recombination site. Further, the fused nucleicacid molecules or the vector may comprise at least one suppressible stopcodon (e.g., amber, opal and/or ochre codons). In addition, either thefirst or second nucleic acid molecule may already be present in thevector prior to application of the methods described above. In specificembodiments of the invention, the vectors and/or host cells comprisegenes which encode at least one suppressor tRNA molecule. In otherspecific embodiments, methods of the invention further comprisetransforming the host cell with a nucleic acid molecule comprising geneswhich encode at least one suppressor tRNA molecule. In yet otherspecific embodiments, the fusion proteins may comprise N- or C-terminaltags (e.g., glutathione S-transferase, P-glucuronidase, greenfluorescent protein, yellow fluorescent protein, red fluorescentprotein, cyan fluorescent protein, maltose binding protein, a sixhistidine tag, an epitope tag, etc.) encoded by at least a portion ofthe vector.

The invention also relates to a method of expressing one or more fusionproteins (e.g., one, two, three, four, five, seven, ten, twelve,fifteen, twenty, thirty, fifty, etc.) comprising:

(a) obtaining at least a first nucleic acid molecule comprising at leastone recombination site (preferably the recombination site is located ator near a terminus or termini of said first nucleic acid molecule) and asecond nucleic acid molecule comprising at least one recombination site(which is preferably located at or near a terminus or termini of saidsecond nucleic acid molecule);

(b) causing said at least first and second nucleic acid molecules torecombine through recombination of said recombination sites, therebyproducing a third nucleic acid molecule comprising all or a portion ofsaid at least first and second molecules; and

(c) expressing one or more peptides or proteins (e.g., one, two, three,four, five, seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.)encoded by said third nucleic acid molecule. In certain suchembodiments, at least part of the expressed fusion protein will beencoded by the third nucleic acid molecule and at least another partwill be encoded by at least part of the first and/or second nucleic acidmolecules. Such a fusion protein may be produced by translation ofnucleic acid which corresponds to recombination sites located betweenthe first and second nucleic acid molecules. Thus, fusion proteins maybe expressed by “reading through” mRNA corresponding to recombinationsites used to connect two or more nucleic acid segments. The inventionfurther includes fusion proteins produced by methods of the inventionand mRNA which encodes such fusion proteins.

As discussed below in more detail, the methods discussed above can beused to prepare fusion proteins which are encoded by different nucleicacid segments, as well as nucleic acid molecules which encode suchfusion proteins. Thus, in one general aspect, the invention providesmethods for producing fusion proteins prepared by the expression ofnucleic acid molecules generated by connecting two or more nucleic acidsegments. In related embodiments, the invention provides methods forproducing fusion RNAs prepared by the expression of nucleic acidmolecules generated by connecting two or more nucleic acid segments.These RNAs may be mRNA or may be untranslated RNAs which have activitiesother than protein coding functions. Examples of such RNAs includeribozymes and tRNAs. The invention further provides nucleic acidmolecules produced by methods of the invention, expression products ofthese nucleic acid molecules, methods for producing these expressionproducts, recombinant host cells which contain these nucleic acidmolecules, and methods for making these host cells. As discussed belowin more detail, the invention further provides combinatorial librarieswhich may be screened to identify nucleic acid molecules and expressionproducts having particular functions or activities.

In one specific aspect, the present invention provides materials andmethods for joining two nucleic acid molecules or portions thereof, eachof which contains at least one recombination site, into one or moreproduct nucleic acid molecules by incubating the molecules underconditions causing the recombination of a recombination site present onone nucleic acid molecule with a recombination site present on the othernucleic acid molecule. The recombination sites are preferably located ator near the ends of the starting nucleic acid molecules. Depending onthe location of the recombination sites within the starting molecules,the product molecule thus created will contain all or a portion of thefirst and second starting molecules joined by a recombination site(which is preferably a new recombination site). For example,recombination between an attB1 recombination site and an attP1recombination site results in generation of an attL1 and/or attR1recombination sites.

In another specific aspect, the present invention provides materials andmethods for joining two or more nucleic acid molecules (e.g., two,three, four, five seven, ten, twelve, fifteen, twenty, thirty, fifty,etc.) into one or more product nucleic acid molecules (e.g., one, two,three, four, five seven, ten, twelve, etc.) wherein each startingnucleic acid molecule has at least one recombination site and at leastone of the starting nucleic acid molecules has at least tworecombination sites. The recombination sites preferably are located ator near one or both termini of the starting nucleic acid molecules.Thus, the invention provides a method of joining at least two nucleicacid molecules wherein at least a first nucleic acid molecule containsat least one recombination site and at least a second nucleic acidmolecule contains two or more recombination sites. The molecules areincubated in the presence of at least one recombination protein underconditions sufficient to combine all or a portion of the startingmolecules to create one or more product molecules. The product moleculesthus created will contain all or a portion of each of the startingmolecules joined by a recombination site (which is preferably a newrecombination site).

In another specific aspect, the present invention provides a method tojoin at least three nucleic acid molecules (e.g., two, three, four, fiveseven, ten, twelve, fifteen, twenty, thirty, fifty, etc.) wherein themolecules have at least one recombination site and at least one of thestarting nucleic acid molecules contains at least two recombinationsites. Incubating such molecules in the presence of at least onerecombination protein provides one or more product molecules (e.g., one,two, three, four, five seven, ten, twelve, fifteen, twenty, thirty,fifty, etc.) containing all or a portion of the starting molecules,wherein each molecule is joined by a recombination site (which ispreferably a new recombination site).

In another specific embodiment, the present invention providescompositions and methods for joining two or more nucleic acid molecules(e.g., two, three, four, five seven, ten, twelve, fifteen, twenty,thirty, fifty, etc.), at least two of which (and preferably all ofwhich) have two or more recombination sites. The recombination siteslocated on each molecule are preferably located at or near the ends ofthe starting nucleic acid molecules. According to the method of theinvention, the two or more nucleic acid molecules or portions thereofare joined by a recombination reaction (e.g., incubate the molecules inthe presence of at least one recombination protein) to form one or moreproduct molecules comprising all or a portion of each starting moleculejoined by a recombination site (which is preferably a new recombinationsite).

In another specific aspect, the present invention provides compositionsand methods for joining at least three nucleic acid molecules comprisingproviding at least a first, a second and a third nucleic acid molecule,wherein the first nucleic acid molecule comprises at least a firstrecombination site, the second nucleic acid molecule comprises at leasta second and a third recombination site and the third nucleic acidmolecule comprises at least a fourth recombination site, wherein thefirst recombination site is capable of recombining with the secondrecombination site and the third recombination site is capable ofrecombining with the fourth recombination site and conducting at leastone recombination reaction such that the first and the secondrecombination sites recombine and the third and the fourth recombinationsites recombine, thereby combining all or a portion of the molecules tomake one or more product molecules.

Thus, the present invention generally relates to a method of combining nnucleic acid molecules or segments, wherein n is an integer greater than1 (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 18, 20, 30, 40, 50, etc.),comprising the steps of providing a 1^(st) through an n^(th) nucleicacid molecule or segment, each molecule from 2 through n−1 having atleast two recombination sites and molecules 1 and n having at least onerecombination site (and preferably having at least two recombinationsites), and contacting the molecules or segments with one or morerecombination proteins (e.g., two, three, four, etc.) under conditionssufficient to cause all or a portion of the segments or molecules torecombine to form one or more product nucleic acid molecules comprisingall or a portion of each 1^(st) through n^(th) molecule or segment.Joining of molecules through recombination sites (e.g., interacting afirst recombination site on first molecule with a second recombinationsite on a second molecule) preferably creates a new recombination siteat the junction of the two molecules and may create a new recombinationsite at each junction where each molecule is joined to the next. Forexample, when joining a number of molecules (e.g., a first or “x”molecule, a second or “y” molecule, and a third or “z” molecule) wheneach molecule has at least two recombination sites, the firstrecombination site on the x molecule interacts with a secondrecombination site on the y molecule and the second recombination siteon the x molecule interacts with a first recombination site on the zmolecule to create a hybrid nucleic acid molecule comprising y:x:zjoined by recombination sites. Of course, other recombination events mayproduce hybrid molecules comprising, for example, x:y:z, x:z:y, y:z:x,z:x:y, and/or z:y:x or fragments thereof, joined by recombination sites.Additional molecules can be added to product molecules by recombinationbetween at least one recombination site located another molecules withone or more recombination sites located on the product molecule (e.g.,interacting a second recombination site on the z molecule with a firstrecombination site on an e molecule, etc. and/or interacting a firstrecombination site on the y molecule with a second recombination site onan f molecule, etc.). Further, the hybrid nucleic acid moleculecomprising y:x:z (or other sequences as noted above) can be circularizedby the interaction of recombination sites on the free ends of y and z.Addition of all or a portion of the starting molecules may be donesequentially or simultaneously.

In instances where nucleic acid segments joined by methods of theinvention contain a terminus, or termini, which do not containrecombination sites, this terminus or termini may be connected to thesame nucleic acid segment or another nucleic acid molecule using aligase or a topoisomerase (e.g., a Vaccinia virus topoisomerase; seeU.S. Pat. No. 5,766,891, the entire disclosure of which is incorporatedherein by reference).

In addition to joining multiple molecules, the invention also provides ameans to replace one or more molecules (or combinations thereof)contained in a product molecule. For instance, any one or more nmolecules comprising the product molecule may be replaced or substitutedby recombination with all or a portion of a different molecule (m) whichcomprises one or more recombination sites. Thus, in one example, m mayreplace x in the y:x:z molecule described above by recombining a firstrecombination site on m with the first recombination site flanking x(e.g., the recombination site between y and x) and recombining a secondrecombination site on m with the second recombination site flanking x(e.g., the recombination site between x and z), to produce y:m:z.Multiple substitutions or replacements may be made within or on anynucleic acid molecule of the invention by recombining one or morerecombination sites on such molecule with one or more recombinationsites within or on the molecule to be substituted. Moreover, one or moredeletions (e.g., two, three, four, five seven, ten, twelve, etc.) ofvarious sizes on the product molecules of the invention may beaccomplished by recombining two or more recombination sites within themolecule of interest for creating the deletion. For example, to create adeletion within the y:x:z (or other arrangement thereof) moleculedescribed above, recombination of the recombination sites flanking the xmolecule will create a new molecule from which x is deleted; that is,the new molecule will comprise y:z. Thus, multiple deletions, multiplereplacements and combinations of deletions and replacements of variousportions of a molecule of interest may be accomplished by directedrecombination within the molecule of interest.

Further, the invention also provides a means to insert one or moremolecules (or combinations thereof) into a product molecule. Forinstance, using the molecule y:x:z described above for illustration,molecule w, which comprises one or more recombination sites may beinserted between y and x to form a new molecule: y:w:x:z. In onespecific embodiment, molecule w is flanked by loxP sites and insertionof molecule w is mediated by Cre recombinase between the loxP sites onthe w molecule and corresponding loxP sites on the y and x molecules. Asone skilled in the art would recognize, numerous variations of the aboveare possible and are included within the scope of the invention. Forexample, molecule o, which comprises one or more recombination sites maybe inserted between y and x to form a new molecule comprising eithery:o:x:z or y:o:w:x:z, depending on the starting molecule. The methodsdescribed herein can be used to insert virtually any number of moleculesinto other molecules. Further, these methods can be used sequentially,for example, to prepare molecules having diverse structures.

The product molecules produced by the methods of the invention maycomprise any combination of starting molecules (or portions thereof) andcan be any size and be in any form (e.g., circular, linear, supercoiled,etc.), depending on the starting nucleic acid molecule or segment, thelocation of the recombination sites on the molecule, and the order ofrecombination of the sites.

Importantly, the present invention provides a means by which populationsof nucleic acid molecules (known or unknown) can be combined with one ormore known or unknown target sequences of interest (e.g., two, three,four, five seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.) orwith other populations of nucleic acid molecules (known or unknown),thereby creating populations of combinatorial molecules (e.g.,combinatorial libraries) from which unique and/or novel molecules (e.g.,hybrid molecules) and proteins or peptides encoded by these moleculesmay be obtained and further analyzed.

In a preferred aspect, the population of nucleic acid molecules used tocreate combinatorial libraries according to the invention may comprise apopulation of segments or molecules having at least one (and preferablytwo or more) recombination sites (e.g., two, three, four, five seven,ten, twelve, etc.). Such populations of molecules are preferablyobtained from genomic or cDNA libraries (or portions thereof) or randomnucleic acids, amplification products (e.g., PCR products generated withvarious primers) and domains (e.g., nucleic acids encoding differentprotein domains from the same or different proteins) constructed tocontain such recombination sites. Thus, in accordance with theinvention, a first population of molecules comprising recombinationsites can be randomly joined or combined through recombination (bydirected and/or random orientation) with at least one target sequence ofinterest or with a second population of molecules comprisingrecombination sites to produce a third population of molecules or hybridmolecules.

In accordance with the invention, multiple populations of molecules fromvarious sources may be combined multiple times to create a newpopulation which comprises molecules having multiple combinations ofsequences. For instance, a first population, a second population and athird population can be recombined to create a fourth populationcomprising a random population of tripartite molecules (e.g., some orall of the molecules of the fourth population contain all or a portionof the segments from the first, second and third population).

In a preferred aspect, the newly created population of molecules (e.g.,the third population) created by the combinatorial methods may bepreferentially selected and thus separated or isolated from the originalmolecules (e.g., target molecules, and first and second populationmolecules) and from undesired product molecules (e.g., cointegratesand/or byproduct molecules). Such selection may be accomplished byassaying or selecting for the presence of a desired nucleic acid fusion(PCR with diagnostic primers) and/or the presence of a desired activityof a protein encoded by the desired nucleic acid fusion. Such selectivemay also be accomplished by positive and/or negative selection. One ormore toxic genes (e.g., two, three, four, five seven, ten, etc.) arepreferably used according to the invention in such negative selectionscheme.

Combinations of selection of the desired fusion product (nucleic acidand/or protein) and positive and/or negative selection may also be usedin the invention. Thus, the invention provides a means for selecting apopulation of Product molecules (or even a specific class of productmolecules or specific product molecule) created by recombinationalcloning and selecting against a population of Insert Donors, VectorDonors and Cointegrates or, in similar fashion, selecting for apopulation of Insert Donors, Vector Donors, Byproducts and/orCointegrates and selecting against a population of Product molecules(see FIG. 1).

Referring to FIG. 2, in the recombinatorial library methods of theinvention, a first population of molecules of the invention, representedby segment A, may be provided as one population of Insert Donormolecules while a second population of molecules, represented by segmentB, may be provided as a second population of Insert Donor molecules.While these segments are depicted as linear fragments, they may beprovided as segments within a larger molecule, for example, as segmentsin a plasmid.

Those skilled in the art will appreciated that in this situation,cointegrate molecules, other than the one shown in FIG. 1, may beproduced. For example, cointegrates comprising a segment A and a segmentB Insert Donor molecule may be formed. In addition, cointegratescomprising segment A and/or segment B Insert Donor molecules and aVector Donor molecule may be formed. The selection methods of thepresent invention permit selection against the Insert Donor moleculesand against the various cointegrate molecules and for the newly createdpopulation of hybrid molecules which may be referred to as a populationof Product molecules. Conversely, the selection methods may permitselection against Products and for Insert/Vector Donors, Byproducts,and/or Cointegrates.

Thus, the invention relates to a method to create a population of hybridnucleic acid molecules comprising:

(a) mixing at least a first population of nucleic acid moleculescomprising one or more recombination sites (e.g., two, three, four, fiveseven, ten, twelve, etc.) with at least one target nucleic acid moleculeof interest comprising one or more recombination sites (e.g., two,three, four, five seven, ten, twelve, etc.);

(b) causing (preferably randomly) some or all of the molecules of saidat least first population to recombine with all or some molecules ofsaid target molecule of interest, thereby forming a third population ofhybrid molecules; and

(c) optionally selecting specifically for said third population ofhybrid molecules.

In accordance with the invention, the hybrid molecules contained by thethird population preferably comprise all or a portion of a moleculeobtained from the first population and all or a portion of the targetmolecule. The orientation in which the molecules are joined may be donein a directed or random manner, depending on the need.

In one aspect, the target molecule used to produce said third populationdescribed above can be a DNA binding domain or a transcriptionactivation domain, such that the third population of hybrid moleculescan be used in 2-hybrid screening methods well known in the art.

The invention more specifically relates to a method of creating apopulation of combinatorial molecules comprising:

(a) obtaining at least a first population of nucleic acid moleculescomprising one or more recombination sites (e.g., two, three, four, fiveseven, ten, twelve, etc.) and at least a second population of nucleicacid molecules comprising one or more recombination sites (e.g., two,three, four, five seven, ten, twelve, etc.);

(b) causing (preferably randomly) some or all of the molecules of atleast said first population to recombine with some or all of themolecules of at least said second population, thereby creating a thirdpopulation of hybrid molecules; and

(c) optionally selecting specifically for said third population ofhybrid molecules.

In accordance with the invention, each or many of the hybrid moleculescontained by the third population preferably comprises all or a portionof a molecule obtained from the first population and all or a portion ofa molecule obtained from the second population. The orientation whichthe molecules are joined may be done in a directed or random manner,depending on the need.

Populations of nucleic acid molecules used in accordance with thecombinatorial methods of the invention can comprise synthetic, genomic,or cDNA libraries (or portions thereof), random synthetic sequences ordegenerate oligonucleotides, domains and the like. Preferably, thepopulation of nucleic acid molecules used comprises a random populationof molecules, each having at least two recombination sites whichpreferably do not recombine with each other and which are preferablylocated at or near both termini of each molecule. Random recombinationof populations of molecules by the methods of the invention provides apowerful technique for generating populations of molecules havingsignificant sequence diversity. For example, recombination of a firstlibrary having about 10⁶ sequences with a second population having about10⁶ sequences results in a third population having about 10¹² sequences.

The invention further provides methods for preparing and screeningcombinatorial libraries in which segments of the nucleic acid moleculesof the library members have been altered. Such alterations includemutation, shuffling, insertion, and/or deletion of nucleic acidsegments. In particular, the invention provides methods for preparingnucleic acid libraries which contain members having such alterations andmethods for introducing such alterations in existing libraries. In arelated aspect, the invention includes combinatorial libraries producedby methods of the invention, methods for screening such libraries toidentify members which encode expression products having particularfunctions or activities, and expression products of these libraries(e.g., RNA, proteins, etc.).

Further, in aspects related to those described above, the inventionprovides methods for generating populations of nucleic acid moleculecontaining one or more (e.g., one, two, three, four, five, ten, fifteen)nucleic acid segments which are the same and one or more nucleic acidsegments which are derived from members of one or more populations ofnucleic acid molecules. One method for producing such nucleic acidmolecules involves the use of a vector which contains two recombinationsites. A first nucleic acid segment, which encodes a protein having aparticular function or activity (e.g., signal peptide activity, DNAbinding activity, affinity for a particular ligand, etc.), is insertedin the first recombination site and a second nucleic acid segment, whichis derived from a population of nucleic acid molecules, is inserted intothe second recombination site. Further, these nucleic acid segments areoperably linked to a sequence which regulates transcription, therebyproducing a fusion peptide and an RNA molecule produced by the fusionsequence. The resulting combinatorial library may then be screened toidentify nucleic acid molecules which encode expression products havingparticular functions or activities (e.g., transcriptional activationactivity; DNA binding activity; the ability to form multimers;localization to a sub-cellular compartments, such as the endoplasmicreticulum, the nucleus, mitochondria, chloroplasts, the cell membrane,etc.; etc.). When three or more (e.g., three, four, five, six, eight,ten, etc.) nucleic acid segments are used in methods such as thosedescribed above, one or more of the nucleic acid segments may be keptconstant and one or more of the nucleic acid segments may be derivedfrom members of one or more populations of nucleic acid molecules. Forexample, in constructing a four part molecule, represented by A-B-C-D, Aand D may be known molecules having known functions (e.g., tags such asHIS6, promoters, transcription or translation signals, selectablemarkers, etc.) while molecules B and C may be derived from one or morepopulations of nucleic acid molecules.

Any of the product molecules of the invention may be furthermanipulated, analyzed or used in any number of standard molecularbiology techniques or combinations of such techniques (in vitro or invivo). These techniques include sequencing, amplification, nucleic acidsynthesis, making RNA transcripts (e.g., through transcription ofproduct molecules using RNA promoters such as T7 or SP6 promoters),protein or peptide expression (for example, fusion protein expression,antibody expression, hormone expression etc.), protein-proteininteractions (2-hybrid or reverse 2-hybrid analysis), homologousrecombination or gene targeting, and combinatorial library analysis andmanipulation. The invention also relates to cloning the nucleic acidmolecules of the invention (preferably by recombination) into one ormore vectors (e.g., two, three, four, five seven, ten, twelve, fifteen,twenty, thirty, fifty, etc.) or converting the nucleic acid molecules ofthe invention into a vector by the addition of certain functional vectorsequences (e.g., origins of replication). In a preferred aspect,recombination is accomplished in vitro (e.g., in cell-free systems) andfurther manipulation or analysis is performed directly in vitro. Thus,further analysis and manipulation will not be constrained by the abilityto introduce the molecules of the invention into a host cell and/ormaintained in a host cell. Thus, less time and higher throughput may beaccomplished by further manipulating or analyzing the molecules of theinvention directly in vitro. Alternatively, in vitro analysis ormanipulation can be done after passage through host cells or can be donedirectly in vivo (e.g., while in the host cells, tissues, organs, ororganisms).

Nucleic acid synthesis steps, according to the invention, may comprise:

(a) mixing a nucleic acid molecule of interest or template with one ormore primers (e.g., one, two, three, four, five seven, ten, twelve,fifteen, twenty, thirty, fifty, etc.) and one or more nucleotides (e.g.,one, two, three, or four) to form a mixture; and

(b) incubating said mixture under conditions sufficient to synthesize anucleic acid molecule complementary to all or a portion of said moleculeor template.

The synthesized molecule may then be used as a template for furthersynthesis of a nucleic acid molecule complementary to all or a portionof the first synthesized molecule. Accordingly, a double strandednucleic acid molecule (e.g., DNA) may be prepared. Preferably, suchsecond synthesis step is preformed in the presence of one or moreprimers and one or more nucleotides under conditions sufficient tosynthesize the second nucleic acid molecule complementary to all or aportion of the first nucleic acid molecule. Typically, synthesis of oneor more nucleic acid molecules (e.g., one, two, three, four, five seven,ten, twelve, fifteen, twenty, thirty, fifty, etc.) is performed in thepresence of one or more polymerases (preferably DNA polymerases whichmay be thermostable or mesophilic), although reverse transcriptases mayalso be used in such synthesis reactions. Accordingly, the nucleic acidmolecules used as templates for the synthesis of additional nucleic acidmolecules may be RNA, mRNA, DNA or non-natural or derivative nucleicacid molecules. Nucleic acid synthesis, according to the invention, maybe facilitated by incorporating one or more primer sites (e.g., two,three, four, five, seven, ten, twelve, fifteen, twenty, thirty, fifty,etc.) into the product molecules through the use of starting nucleicacid molecules containing such primer sites. Thus, by the methods of theinvention, primer sites may be added at one or a number of desiredlocations in the product molecules, depending on the location of theprimer site within the starting molecule and the order of addition ofthe starting molecule in the product molecule.

Sequencing steps, according to the invention, may comprise:

(a) mixing a nucleic acid molecule to be sequenced with one or moreprimers (e.g., one, two, three, four, five seven, ten, twelve, fifteen,twenty, thirty, fifty, etc.), one or more nucleotides (e.g., one, two,three, or four) and one or more termination agents (e.g., one, two,three, four, or five) to form a mixture;

(b) incubating said mixture under conditions sufficient to synthesize apopulation of molecules complementary to all or a portion of saidmolecules to be sequenced; and

(c) separating said population to determine the nucleotide sequence ofall or a portion of said molecule to be sequenced.

Such sequencing steps are preferably performed in the presence of one ormore polymerases (e.g., DNA polymerases and/or reverse transcriptases)and one or more primers. Preferred terminating agents for sequencinginclude derivative nucleotides such as dideoxynucleotides (ddATP, ddTTP,ddGTP, ddCTP and derivatives thereof). Nucleic acid sequencing,according to the invention, may be facilitated by incorporating one ormore sequencing primer sites (e.g., one, two, three, four, five, seven,ten, twelve, fifteen, twenty, thirty, fifty, etc.) into the productmolecules through the use of starting nucleic acid molecules containingsuch primer sites. Thus, by the methods of the invention, sequencingprimer sites may be added at one or a number of desired locations in theproduct molecules, depending on the location of the primer site withinthe starting molecule and the order of addition of the starting moleculein the product molecule.

Protein expression steps, according to the invention, may comprise:

(a) obtaining a nucleic acid molecule to be expressed which comprisesone or more expression signals (e.g., one, two, three, or four); and

(b) expressing all or a portion of the nucleic acid molecule undercontrol of said expression signal thereby producing a peptide or proteinencoded by said molecule or portion thereof.

In this context, the expression signal may be said to be operably linkedto the sequence to be expressed. The protein or peptide expressed can beexpressed in a host cell (in vivo), although expression may be conductedin vitro (e.g., in cell-free expression systems) using techniques wellknown in the art. Upon expression of the protein or peptide, the proteinor peptide product may optionally be isolated or purified. Moreover, theexpressed protein or peptide may be used in various protein analysistechniques including 2-hybrid interaction, protein functional analysis,and agonist/antagonist-protein interactions (e.g., stimulation orinhibition of protein function through drugs, compounds or otherpeptides). Further, expressed proteins or peptides may be screened toidentify those which have particular biological activities. Examples ofsuch activities include binding affinity for nucleic acid molecules(e.g., DNA or RNA) or proteins or peptides. In particular, expressedproteins or peptides may be screened to identify those with bindingaffinity for other proteins or themselves. Proteins or peptides whichhave binding affinities for themselves will generally be capable offorming multimers or aggregates. Proteins or peptides which have bindingaffinities for themselves and/or other proteins will often be capable offorming or participating in the formation of multi-protein complexessuch as antibodies, splicesomes, multi-subunit enzymes, multi-subunitenzymes, ribosomes, etc. Further included within the scope of theinvention are the expressed proteins or peptides described above,nucleic acid molecules which encodes these proteins, methods for makingthese nucleic acid molecules, methods for producing recombinant hostcells which contain these nucleic acid molecules, recombinant host cellsproduced by these methods, and methods for producing the expressedproteins or peptides.

The novel and unique hybrid proteins or peptides (e.g., fusion proteins)produced by the invention and particularly from expression of thecombinatorial molecules of the invention may generally be useful for anynumber of applications. More specifically, as one skilled in the artwould recognize, hybrid proteins or peptides of the invention may bedesigned and selected to identify those which to perform virtually anyfunction. Examples of applications for which these proteins may be usedinclude therapeutics, industrial manufacturing (e.g., microbialsynthesis of amino acids or carbohydrates), small moleculeidentification and purification (e.g., by affinity chromatography), etc.

Protein expression, according to the invention, may be facilitated byincorporating one or more transcription or translation signals (e.g.,one, two, three, four, five, seven, ten, twelve, fifteen, etc.) orregulatory sequences, start codons, termination signals, splicedonor/acceptor sequences (e.g., intronic sequences) and the like intothe product molecules through the use of starting nucleic acid moleculescontaining such sequences. Thus, by the methods of the invention,expression sequences may be added at one or a number of desiredlocations in the product molecules, depending on the location of suchsequences within the starting molecule and the order of addition of thestarting molecule in the product molecule.

In another aspect, the invention provides methods for performinghomologous recombination between nucleic acid molecules comprising (a)mixing at least a first nucleic acid molecule which comprises one ormore recombination sites with at least one target nucleic acid molecule,wherein the first and target nucleic acid molecules have one or morehomologous sequences; and (b) causing the first and target nucleic acidmolecules to recombine by homologous recombination. In specificembodiments of the invention, the homologous recombination methods ofthe invention result in transfer of all or a portion of the firstnucleic acid molecule into the target nucleic acid molecule. In certainspecific embodiments of the invention, the first nucleic acid moleculecomprises two or more sequences which are homologous to sequences of thetarget nucleic acid molecule. In other specific embodiments, thehomologous sequences of the first nucleic acid molecule flank at leastone selectable marker and/or one or more recombination sites. In yetother specific embodiments, the homologous sequences of the firstnucleic acid molecule flank at least one selectable marker flanked byrecombination sites. In additional specific embodiments, the homologoussequences of the first nucleic acid molecule flank a nucleic acidsegment which regulates transcription.

Further, homologous recombination, according to the invention, maycomprise:

(a) mixing at least a first nucleic acid molecule of the invention(which is preferably a product molecule) comprising one or morerecombination sites (e.g., one, two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.) with at least one targetnucleic acid molecule (e.g., one, two, three, four, five, seven, ten,twelve, etc.), wherein said first and target molecules have one or morehomologous sequences (e.g., one, two, three, four, five, seven, etc.);and

(b) causing said first and target nucleic acid molecules to recombine byhomologous recombination.

Such homologous recombination may occur in vitro (e.g., in cell-freesystems), but preferably is accomplished in vivo (e.g., in a host cell).Preferably, homologous recombination causes transfer of all or a portionof a nucleic acid molecule of the invention containing recombinationsites (the first nucleic acid molecule) into one or more positions ofthe target nucleic acid molecule containing homologous sequences (e.g.,one, two, three, four, five, seven, etc.). Selection of such homologousrecombination may be facilitated by positive or negative selection(e.g., using selectable markers) to select for a desired product and/oragainst an undesired product. In a preferred aspect, the nucleic acidmolecule of the invention comprises at least one selectable marker andat least two sequences which are homologous to the target molecule.Preferably, the first molecule comprises at least two homologoussequences flanking at least one selectable marker.

The present invention thus facilitates construction of gene targetingnucleic acid molecules or vectors which may be used to knock-out ormutate a sequence or gene of interest (or alter existing sequences, forexample to convert a mutant sequence to a wild-type sequence),particularly genes or sequences within a host or host cells such asanimals (including animals, such as humans), plants, insects, bacteria,yeast, and the like or sequences of adventitious agents such as viruseswithin such host or host cells. Such gene targeting may preferablycomprise targeting a sequence on the genome of such host cells. Suchgene targeting may be conducted in vitro (e.g., in a cell-free system)or in vivo (e.g., in a host cell). Thus, in a preferred aspect, theinvention relates to a method of targeting or mutating a nucleotidesequence or a gene comprising:

(a) obtaining at least one nucleic acid molecule of the inventioncomprising one or more recombination sites (and preferably one or moreselectable markers) wherein said molecule comprises one or morenucleotide sequences homologous to the target gene or nucleotidesequence of interest (said one or more homologous sequences preferablyflank one or more selectable markers e.g., one, two, three, four, five,seven, ten, etc.) on the molecule of the invention); and

(b) contacting said molecule with one or more target genes or nucleotidesequences of interest (e.g., one, two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.) under conditionssufficient to cause homologous recombination at one or more sites e.g.,one, two, three, four, five, seven, ten, etc.) between said targetnucleotide sequence or gene of interest and said molecule of theinvention, thereby causing insertion of all or a portion of the moleculeof the invention within the target nucleotide sequence or gene.

Such targeting method may cause deletion, activation, inactivation,partial inactivation, or partial activation of the target nucleic acidor gene such that an expression product (typically a protein or peptide)normally expressed by the target nucleic acid or gene is not produced orproduced at a higher or lower level or to the extent produced is has analtered protein sequence which may result in more or less activity or inan inactive or partially inactive expression product. The selectablemarker preferably present on the molecule of the invention facilitatesselection of candidates (for example host cells) in which the homologousrecombination event was successful. Thus, the present invention providesa method to produce host cells, tissues, organs, and animals (e.g.,transgenic animals) containing the modified nucleic acid or geneproduced by the targeting methods of the invention. The modified nucleicacid or gene preferably comprises at least one recombination site and/orat least one selectable marker provided by the nucleic acid molecule ofthe invention.

Thus, the present invention more specifically relates to a method oftargeting or mutating a nucleic acid or a gene comprising:

(a) obtaining at least one nucleic acid molecule of the inventioncomprising one or more recombination sites (e.g., one, two, three, four,five, seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.) and atleast one selectable marker (e.g., one, two, three, four, five, seven,ten, etc.) flanked by one or more sequences homologous to the targetnucleic acid or gene of interest (e.g., one, two, three, four, five,seven, ten, etc.);

(b) contacting said molecule with one or more target nucleic acids orgenes of interest (e.g., one, two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.) under conditionssufficient to cause homologous recombination at one or more sitesbetween the target nucleic acid or gene of interest and the nucleic acidmolecule, thereby causing insertion of all or a portion of the nucleicacid molecule of the invention (and preferably causing insertion of atleast one selectable marker and/or at least one recombination site)within the target nucleic acid or gene of interest; and

(c) optionally selecting for the target nucleic acid or gene of interestcomprising all or a portion of the nucleic acid molecule of theinvention or for a host cell containing the target nucleic acid or genecontaining all or a portion of the nucleic acid molecule of theinvention.

Preferably, selectable markers used in the methods described above arepositive selection markers (e.g., antibiotic resistance markers such asampicillin, tetracycline, kanamycin, neomycin, and G-418 resistancemarkers).

In one general aspect, the invention provides methods for targeting ormutating a target gene or nucleotide sequence comprising, (a) obtainingat least one first nucleic acid molecule comprising one or morerecombination sites and one or more selectable markers, wherein thefirst nucleic acid molecule comprises one or more nucleotide sequenceshomologous to the target gene or nucleotide sequence; and (b) contactingthe first nucleic acid molecule with one or more target genes ornucleotide sequences under conditions sufficient to cause homologousrecombination at one or more sites between the target gene or nucleotidesequence and the first nucleic acid molecule, thereby causing insertionof all or a portion of the first nucleic acid molecule within the targetgene or nucleotide sequence. In certain specific embodiments of theinvention, the first nucleic acid molecule comprises at least oneselectable marker flanked by the homologous sequences. In other specificembodiments, the selectable marker is flanked by the homologoussequences. In additional specific embodiments, the target gene ornucleotide sequence is inactivated as a result of the homologousrecombination. In yet additional specific embodiments, methods of theinvention further comprise selecting for a host cell containing thetarget gene or nucleotide sequence.

In some specific embodiments, one or more of the one or more nucleotidesequences of the first nucleic acid molecule which are homologous to thetarget gene or nucleotide sequence will not be 100% identical to thetarget gene or nucleotide sequence. In other words, the nucleic acidsegments which facilitate homologous recombination need not necessarilyshare 100% sequence identity. However, in general, these nucleic acidsegments will share at least 70% identity (e.g., at least 70%, at least75%, at least 80%, at least 85%, at least 90%, at least 95%, at least97%, at least 98%, or at least 99%) in their regions of homology.

The use of nucleic acid segments to facilitate homologous recombinationwhich do not share 100% sequence identity to the nucleic acid with whichthey are to recombine (i.e., the target gene or nucleotide sequence) canbe advantageous under a number of instances. One example of such aninstance is where the homologous nucleic acids correspond to part of atarget nucleotide sequence which is a gene and homologous recombinationresults in the introduction one or more sequence alterations in thetarget nucleotide sequence. In a related example, the homologous nucleicacids may correspond to a target nucleotide sequence which represents anentire gene. Thus, homologous recombination results in replacement ofthe target gene. Another example of such an instance is where one seeksto perform homologous recombination on an organism which has differentnucleotide sequences at the site where homologous recombination is tooccur as compared to the one or more homologous nucleotide sequences ofthe first nucleic acid molecule. The differences in these sequences mayresult, for example, when an organism in which homologous recombinationis intended to occur is of a different strain or species than theorganism from which the homologous nucleotide sequences of the firstnucleic acid molecule are obtained or where the organism has a differentgenotype at the recombination locus.

Further, the length of the homologous regions which facilitaterecombination can vary in size, but, will generally be at least 15nucleotides in length (e.g., at least 20 nucleotides, at least 50nucleotides, at least 100 nucleotides, at least 200 nucleotides, atleast 400 nucleotides, at least 600 nucleotides, at least 800nucleotides, at least 1000 nucleotides, at least 1500 nucleotides, atleast 2000 nucleotides, at least 2500 nucleotides, at least 3000nucleotides, at least 3500 nucleotides, at least 4000 nucleotides atleast 4500 nucleotides, at least 5000 nucleotides, at least 5500nucleotides, at least 6000 nucleotides, at least 6000 nucleotides,etc.).

The invention further provides recombinant host cells produced by themethods described herein, which may be prokaryotic (e.g., bacteria), oreukaryotic (e.g., fungal (e.g., yeasts), plant, or animal (e.g., insect,mammalian including human, etc.) hosts).

In another aspect of the invention, recombination sites introduced intotargeted nucleic acids or genes according to the invention may be usedto excise, replace, or remove all or a portion of the nucleic acidmolecule inserted into the target nucleic acid or gene of interest.Thus, the invention allows for in vitro or in vivo removal of suchnucleic acid molecules and thus may allow for reactivation of the targetnucleic acid or gene. In some embodiments, after identification andisolation of a nucleic acid or gene containing the alterationsintroduced as above, a selectable marker present on the molecule of thepresent invention may be removed.

The present invention also provides methods for cloning the starting orproduct nucleic acid molecules of the invention into one or more vectorsor converting the product molecules of the invention into one or morevectors. In one aspect, the starting molecules are recombined to makeone or more product molecules and such product molecules are cloned(preferably by recombination) into one or more vectors. In anotheraspect, the starting molecules are cloned directly into one or morevectors such that a number of starting molecules are joined within thevector, thus creating a vector containing the product molecules of theinvention. In another aspect, the starting molecules are cloned directlyinto one or more vectors such that the starting molecules are not joinedwithin the vector (i.e., the starting molecules are separated by vectorsequences). In yet another aspect, a combination of product moleculesand starting molecules may be cloned in any order into one or morevectors, thus creating a vector comprising a new product moleculeresulting from a combination of the original starting and productmolecules.

Thus, the invention relates to a method of cloning comprising:

(a) obtaining at least one nucleic acid molecule of the invention (e.g.,one, two, three, four, five, seven, ten, twelve, etc.) comprisingrecombination sites; and

(b) transferring all or a portion of said molecule into one or morevectors (e.g., one, two, three, four, five, seven, ten, twelve, fifteen,etc.).

Preferably, such vectors comprise one or more recombination sites (e.g.,one, two, three, four, five, seven, ten, twelve, fifteen, twenty,thirty, fifty, etc.) and the transfer of the molecules into such vectorsis preferably accomplished by recombination between one or more sites onthe vectors (e.g., one, two, three, four, five, seven, ten, etc.) andone or more sites on the molecules of the invention (e.g., one, two,three, four, five, seven, ten, etc.). In another aspect, the productmolecules of the invention may be converted to molecules which functionas vectors by including the necessary vector sequences (e.g., origins ofreplication). Thus, according to the invention, such vectors sequencesmay be incorporated into the product molecules through the use ofstarting molecules containing such sequences. Such vector sequences maybe added at one or a number of desired locations in the productmolecules, depending on the location of the sequence within the startingmolecule and the order of addition of the starting molecules in theproduct molecule. Thus, the invention allows custom construction of adesired vector by combining (preferably through recombination) anynumber of functional elements that may be desired into the vector. Theproduct molecule containing the vector sequences may be in linear formor may be converted to a circular or supercoiled form by causingrecombination of recombination sites within the product molecule or byligation techniques well known in the art. Preferably, circularizationof such product molecule is accomplished by recombining recombinationsites at or near both termini of the product molecule or by ligating thetermini of the product molecule to circularize the molecule. As will berecognized, linear or circular product molecules can be introduced intoone or more hosts or host cells for further manipulation.

Vector sequences useful in the invention, when employed, may compriseone or a number of elements and/or functional sequences and/or sites (orcombinations thereof) including one or more sequencing or amplificationprimers sites (e.g., one, two, three, four, five, seven, ten, etc.), oneor more sequences which confer translation termination suppressoractivities (e.g., one, two, three, four, five, seven, ten, etc.) such assequences which encode suppressor tRNA molecules, one or more selectablemarkers (e.g., one, two, three, four, five, seven, or ten toxic genes,antibiotic resistance genes, etc.), one or more transcription ortranslation sites or signals (e.g., one, two, three, four, five, seven,ten, etc.), one or more transcription or translation termination sites(e.g., one, two, three, four, five, seven, ten, twelve, etc.), one ormore splice sites (e.g., one, two, three, four, five, seven, ten, etc.)which allows for the excision, for example, of RNA corresponding torecombination sites or protein translated from such sites, one or moretag sequences (e.g., HIS6, GST, GUS, GFP, YFP, CFP, epitope tags, etc.),one or more restriction enzyme sites (e.g., multiple cloning sites), oneor more origins of replication (e.g., one, two, three, four, five,seven, ten, etc.), one or more recombination sites (or portions thereof)(e.g., one, two, three, four, five, seven, ten, twelve, fifteen, twenty,thirty, fifty, etc.), etc. The vector sequences used in the inventionmay also comprise stop codons which may be suppressed to allowexpression of desired fusion proteins as described herein. Thus,according to the invention, vector sequences may be used to introduceone or more of such elements, functional sequences and/or sites into anyof the nucleic acid molecule of the invention, and such sequences may beused to further manipulate or analyze such nucleic acid molecule. Forexample, primer sites provided by a vector (preferably located on bothsides of the insert cloned in such vector) allow sequencing oramplification of all or a portion of a product molecule cloned into thevector.

Additionally, transcriptional or regulatory sequences contained by thevector allows expression of peptides, polypeptides or proteins encodedby all or a portion of the product molecules cloned to the vector.Likewise, genes, portions of genes or sequence tags (such as GUS, GST,GFP, YFP, CFP, His tags, epitope tags and the like) provided by thevectors allow creation of populations of gene fusions with the productmolecules cloned in the vector or allows production of a number ofpeptide, polypeptide or protein fusions encoded by the sequence tagsprovided by the vector in combination with the product sequences clonedin such vector. Such genes, portions of genes or sequence tags may beused in combination with optionally suppressed stop codons to allowcontrolled expression of fusion proteins encoded by the sequence ofinterest being cloned into the vector and the vector supplied gene ortag sequence.

In a construct, the vector may comprise one or more recombination sites,one or more stop codons and one or more tag sequences. In someembodiments, the tag sequences may be adjacent to a recombination site.Optionally, a suppressible stop codon may be incorporated into thesequence of the tag or in the sequence of the recombination site inorder to allow controlled addition of the tag sequence to the gene ofinterest. In embodiments of this type, the gene of interest may beinserted into the vector by recombinational cloning such that the tagand the coding sequence of the gene of interest are in the same readingframe.

The gene of interest may be provided with translation initiation signals(e.g., Shine-Delgamo sequences, Kozak sequences and/or IRES sequences)in order to permit the expression of the gene with a native N-terminalwhen the stop codon is not suppressed. Further, recombination siteswhich reside between nucleic acid segments which encode components offusion proteins may be designed either to not encode stop codons or tonot encode stop codons in the fusion protein reading frame. The gene ofinterest may also be provided with a stop codon (e.g., a suppressiblestop codon) at the 3′-end of the coding sequence. Similarly, when afusion protein is produced from multiple nucleic acid segments (e.g.,three, four, five, six, eight, ten, etc. segments), nucleic acid whichencodes stop codons can be omitted between each nucleic acid segmentand, if desired, nucleic acid which encodes a stop codon can bepositioned at the 3′ end of the fusion protein coding region.

In some embodiments, a tag sequence may be provided at both the N- andC-terminals of the gene of interest. Optionally, the tag sequence at theN-terminal may be provided with a stop codon and the gene of interestmay be provided with a stop codon and the tag at the C-terminal may beprovided with a stop codon. The stop codons may be the same ordifferent.

In some embodiments, the stop codon of the N-terminal tag is differentfrom the stop codon of the gene of interest. In embodiments of thistype, suppressor tRNAs corresponding to one or both of the stop codonsmay be provided. When both are provided, each of the suppressor tRNAsmay be independently provided on the same vector, on a different vector,or in the host cell genome. The suppressor tRNAs need not both beprovided in the same way, for example, one may be provided on the vectorcontain the gene of interest while the other may be provided in the hostcell genome.

Depending on the location of the expression signals (e.g., promoters),suppression of the stop codon(s) during expression allows production ofa fusion peptide having the tag sequence at the N- and/or C-terminus ofthe expressed protein. By not suppressing the stop codon(s), expressionof the sequence of interest without the N- and/or C-terminal tagsequence may be accomplished. Thus, the invention allows throughrecombination efficient construction of vectors containing a gene orsequence of interest (e.g., one, two, three, four, five, six, ten, ormore ORF's) for controlled expression of fusion proteins depending onthe need.

Preferably, the starting nucleic acid molecules or product molecules ofthe invention which are cloned or constructed according to the inventioncomprise at least one open reading frame (ORF) (e.g., one, two, three,four, five, seven, ten, twelve, or fifteen ORFs). Such starting orproduct molecules may also comprise functional sequences (e.g., primersites, transcriptional or translation sites or signals, terminationsites (e.g., stop codons which may be optionally suppressed), origins ofreplication, and the like, and preferably comprises sequences thatregulate gene expression including transcriptional regulatory sequencesand sequences that function as internal ribosome entry sites (IRES).Preferably, at least one of the starting or product molecules and/orvectors comprise sequences that function as a promoter. Such starting orproduct molecules and/or vectors may also comprise transcriptiontermination sequences, selectable markers, restriction enzymerecognition sites, and the like.

In some embodiments, the starting or product and/or vectors comprise twocopies of the same selectable marker, each copy flanked by tworecombination sites. In other embodiments, the starting or productand/or vectors comprise two different selectable markers each flanked bytwo recombination sites. In some embodiments, one or more of theselectable markers may be a negative selectable marker (e.g., ccdB,kicB, Herpes simplex thymidine kinase, cytosine deaminase, etc.).

In one aspect, the invention provides methods of cloning nucleic acidmolecules comprising (a) providing a first nucleic acid segment flankedby a first and a second recombination site; (b) providing a secondnucleic acid segment flanked by a third and a fourth recombination site,wherein either the first or the second recombination site is capable ofrecombining with either the third or the fourth recombination site; (c)conducting a recombination reaction such that the two nucleic acidsegments are recombined into a single nucleic acid molecule; and (d)cloning the single nucleic acid molecule. In certain specificembodiments of these methods, the first recombination site is notcapable of recombining with the second and fourth recombination sitesand the second recombination site is not capable of recombining with thefirst and third recombination sites.

In a specific aspect, the invention provides a method of cloningcomprising providing at least a first nucleic acid molecule comprisingat least a first and a second recombination site and at least a secondnucleic acid molecule comprising at least a third and a fourthrecombination site, wherein either the first or the second recombinationsite is capable of recombining with either the third or the fourthrecombination site and conducting a recombination reaction such that thetwo nucleic acid molecules are recombined into one or more productnucleic acid molecules and cloning the product nucleic acid moleculesinto one or more vectors. Preferably, the recombination sites flank thefirst and/or second nucleic acid molecules. Moreover, the cloning stepis preferably accomplished by the recombination reaction of the productmolecule into a vector comprising one or more recombination sites,although such cloning steps may be accomplished by standard ligationreactions well known in the art. In one aspect, the cloning stepcomprises conducting a recombination reaction between the sites in theproduct nucleic acid molecule that did not react in the firstrecombination reaction with a vector having recombination sites capableof recombining with the unreacted sites.

In another aspect, the invention provides methods of cloning nucleicacid molecules comprising (a) providing a first nucleic acid segmentflanked by at least a first and a second recombination sites and asecond nucleic acid segment flanked by at least a third and a fourthrecombination sites, wherein none of the recombination sites flankingthe first and second nucleic acid segments are capable of recombiningwith any of the other sites flanking the first and second nucleic acidsegments; (b) providing a vector comprising at least a fifth, sixth,seventh and eighth recombination sites, wherein each of the at leastfifth, sixth, seventh and eighth recombination sites is capable ofrecombining with one of the at least first, second, third and/or fourthrecombination sites; and (c) conducting a recombination reaction suchthat the two nucleic acid segments are recombined into the vectorthereby cloning the first and the second nucleic acid segments.

In another specific aspect, the invention provides a method of cloningcomprising providing at least a first nucleic acid molecule comprisingat least a first and a second recombination site and at least a secondnucleic acid molecule comprising at least a third and a fourthrecombination site, wherein none of the first, second, third or fourthrecombination sites is capable of recombining with any of the othersites, providing one or more vectors (e.g., two, three, four, five,seven, ten, twelve, etc.), comprising at least a fifth, sixth, seventhand eighth recombination site, wherein each of the fifth, sixth, seventhand eighth recombination sites are capable of recombining with one ofthe first, second, third or fourth recombination site, and conducting arecombination reaction such that at least said first and secondmolecules are recombined into said vectors. In a further aspect, themethod may allow cloning of at least one additional nucleic acidmolecule (e.g., at least a third nucleic acid molecule), wherein saidmolecule is flanked by a ninth and a tenth recombination site andwherein the vector comprises an eleventh and a twelfth recombinationsite each of which is capable of recombining with either the ninth orthe tenth recombination site.

The invention also specifically relates to a method of cloningcomprising providing a first, a second and a third nucleic acidmolecule, wherein the first nucleic acid molecule is flanked by at leasta first and a second recombination sites, the second nucleic acidmolecule is flanked by at least a third and a fourth recombination sitesand the third nucleic acid molecule is flanked by at least a fifth and asixth recombination sites, wherein the second recombination site iscapable of recombining with the third recombination site and the fourthrecombination site is capable of recombining with the fifthrecombination site, providing a vector having at least a seventh and aneighth recombination sites, wherein the seventh recombination site iscapable of reacting with the first recombination site and the eighthrecombination site is capable of reacting with the sixth recombinationsite, and conducting at least one recombination reaction such that thesecond and the third recombination sites recombine, the fourth and thefifth recombination sites recombine, the first and the seventhrecombination sites recombine and the sixth and the eighth recombinationsites recombine thereby cloning the first, second and third nucleic acidsegments in said vector.

In another specific aspect, the invention provides a method of cloningcomprising providing at least a first, a second and a third nucleic acidmolecule, wherein the first nucleic acid molecule is flanked by a firstand a second recombination site, the second nucleic acid molecule isflanked by a third and a fourth recombination site and the third nucleicacid molecule is flanked by a fifth and a sixth recombination site,wherein the second recombination site is capable of recombining with thethird recombination site and none of the first, fourth, fifth or sixthrecombination sites is capable of recombining with any of the firstthrough sixth recombination sites, providing one or more vectorscomprising a seventh and an eighth recombination site flanking at leasta first selectable marker and comprising a ninth and a tenthrecombination site flanking at least a second selectable marker whereinnone of the seventh through tenth recombination sites can recombine withany of the seventh through tenth recombination sites, conducting atleast one recombination reaction such that the second and the thirdrecombination sites recombine, the first and the fourth recombinationsites recombine with the seventh and the eighth recombination sites andthe fifth and the sixth recombination sites recombine with the ninth andthe tenth recombination sites thereby cloning the first, second andthird nucleic acid segments. In some embodiments, the selectable markersmay be the same or may be different. Moreover, the one or moreselectable markers (e.g., two, three, four, five, seven, etc.) may benegative selectable markers.

The invention also provides methods of cloning n nucleic acid segments,wherein n is an integer greater than 1, comprising (a) providing nnucleic acid segments, each segment flanked by two recombination siteswhich do not recombine with each other; (b) providing a vectorcomprising 2 n recombination sites, wherein each of the 2 nrecombination sites is capable of recombining with one of therecombination sites flanking one of the nucleic acid segments; and (c)conducting a recombination reaction such that the n nucleic acidsegments are recombined into the vector thereby cloning the n nucleicacid segments. In specific embodiments, the recombination reactionbetween the n nucleic acid segments and the vector is conducted in thepresence of one or more recombination proteins under conditions whichfavor the recombination. In other specific embodiments, n is 2, 3, 4, or5.

Thus, the invention generally provides a method of cloning n nucleicacid molecules, wherein n is an integer greater than 1, comprising thesteps of providing n nucleic acid molecules, each molecule comprising atleast one and preferably two recombination sites (the two recombinationsites preferably flank the n nucleic acid molecule), providing at leastone vector comprising one or more recombination sites (and preferably 2n recombination sites) wherein the vector containing recombination sitesis capable of recombining with the recombination sites of the nmolecules, and conducting a recombination reaction such that the nnucleic acid molecules are inserted into said vectors thereby cloningthe n nucleic acid segments. The n molecules may be inserted next to oradjoining each other in the vector and/or may be inserted at differentpositions within the vector. The vectors used for cloning according tothe invention preferably comprise n copies of the same or differentselectable marker, each copy of which is flanked by at least tworecombination sites. Preferably, one or more of the selectable markersare negative selectable markers.

The invention also generally relates to a method of cloning n nucleicacid molecules, wherein n is an integer greater than 1, comprising thesteps of providing a 1^(st) through an n^(th) nucleic acid molecules,each molecule flanked by at least two recombination sites, wherein therecombination sites are selected such that one of the two recombinationsites flanking the i^(th) segment, n_(i), reacts with one of therecombination sites flanking the n_(i−1).sup.th segment and the otherrecombination site flanking the i^(th) segment, n_(i) reacts with one ofthe recombination sites flanking the n_(i+1) ^(th) segment, providing avector comprising at least two recombination sites wherein one of thetwo recombination sites on the vector react with one of the sites on the1^(st) nucleic acid segment and another site on the vector reacts with arecombination site on the n^(th) nucleic acid segment.

The nucleic acid molecules/segments cloned by the methods of theinvention can be different types and can have different functionsdepending on the need and depending on the functional elements present.In one aspect, at least one of the nucleic acid segments clonedaccording to the invention is operably linked to a sequence which iscapable of regulating transcription (e.g., a promoter, an enhancer, arepressor, etc.). For example, at least one of the nucleic acid segmentsmay be operably linked to a promoter which is either an induciblepromoter or a constitutive promoter. In yet other specific embodiments,translation of an RNA produced from the cloned nucleic acid segmentsresults in the production of either a fusion protein or all or part of asingle protein. In additional specific embodiments, at least one of thenucleic acid segments encodes all of part of an open reading frame andat least one of the nucleic acid segments contains a sequence which iscapable of regulating transcription (e.g., a promoter, an enhancer, arepressor, etc.). In further specific embodiments, at least one of thenucleic acid segments produces a sense RNA strand upon transcription andat least one of the nucleic acid segments produces an antisense RNAstrand upon transcription. In related embodiments, the sense RNA andantisense RNA have at least one complementary region and are capable ofhybridizing to each other. In other specific embodiments, transcriptionof at least two of the nucleic acid segments results in the productionof a single RNA or two separate RNAs. In various specific embodiments,these nucleic acid segments may be connected to each other or may bespatially separated within the same nucleic acid molecule. In specificembodiments, the nucleic acid segments comprise nucleic acid moleculesof one or more libraries. Further, these libraries may comprise cDNA,synthetic DNA, or genomic DNA. In addition, the nucleic acid moleculesof these libraries may encode variable domains of antibody molecules(e.g., variable domains of antibody light and heavy chains). In specificembodiments, the invention provides screening methods for identifyingnucleic acid molecules which encode proteins having binding specificityfor one or more antigens and/or proteins having one or more activities(e.g., secretion from a cell, sub-cellular localization (e.g.,localization to the endoplasmic reticulum, the nucleus, mitochondria,chloroplasts, the cell membrane, etc.), ligand binding activity (e.g.,small molecules, binding activities for nucleic acids, cell surfacereceptors, soluble proteins, metal ions, structural elements, proteininteraction domains, etc.), enzymatic activity, etc.). Further, nucleicacid molecules/segments cloned using methods of the invention may haveone or more of the activities referred to above.

In another aspect, the invention provides methods of cloning at leastone nucleic acid molecule comprising (a) providing at least a first, asecond and a third nucleic acid segments, wherein the first nucleic acidsegment is flanked by at least a first and a second recombination sites,the second nucleic acid segment is flanked by at least a third and afourth recombination sites and the third nucleic acid segment is flankedby at least a fifth and a sixth recombination sites, wherein the secondrecombination site is capable of recombining with the thirdrecombination site and none of the first, fourth, fifth or sixthrecombination sites is capable of recombining with any of the firstthrough sixth recombination sites; (b) providing a vector comprising atleast a seventh and an eighth recombination sites flanking at least afirst negative selectable marker and comprising at least a ninth and atenth recombination sites flanking at least a second negative selectablemarker, wherein none of the seventh through tenth recombination sitescan recombine with any of the seventh through tenth recombination sites;(c) conducting a first recombination reaction such that the second andthe third recombination sites recombine; and (d) conducting a secondrecombination reaction such that the first and the fourth recombinationsites recombine with the seventh and the eighth recombination sites andthe fifth and the sixth recombination sites recombine with the ninth andthe tenth recombination sites thereby cloning the first, second andthird nucleic acid segments. In related embodiments, the first andsecond recombination reactions are conducted in the presence of one ormore recombination proteins under conditions which favor therecombination. Such first and second recombination reactions may becarried out simultaneously or sequentially.

In another aspect, the invention provides methods of cloning at leastone nucleic acid molecule comprising (a) providing a first, a second anda third nucleic acid segment, wherein the first nucleic acid segment isflanked by a first and a second recombination site, the second nucleicacid segment is flanked by a third and a fourth recombination site andthe third nucleic acid segment is flanked by a fifth and a sixthrecombination site, wherein the second recombination site is capable ofrecombining with the third recombination site and the fourthrecombination site is capable of recombining with the fifthrecombination site; (b) providing a vector comprising a seventh and aneighth recombination site; and (c) conducting at least one recombinationreaction such that the second and the third recombination sitesrecombine and the fourth and the fifth recombination sites recombine andthe first and the sixth recombination sites recombine with the seventhand the eighth recombination sites respectively, thereby cloning thefirst, second and third nucleic acid segments. In related embodiments,the recombination reaction is conducted in the presence of one or morerecombination proteins under conditions which favor the recombination.In specific embodiments, the recombination sites which recombine witheach other comprise att sites having identical seven base pair overlapregions.

In another aspect, the invention provides methods of cloning n nucleicacid fragments, wherein n is an integer greater than 2, comprising (a)providing a 1^(st) through an nth nucleic acid segment, each segmentflanked by two recombination sites, wherein the recombination sites areselected such that one of the two recombination sites flanking thei^(th) segment, n_(i), reacts with one of the recombination sitesflanking the n^(i+1th) segment and the other recombination site flankingthe i^(th) segment reacts with one of the recombination sites flankingthe n^(i+1th) segment; (b) providing a vector comprising at least tworecombination sites, wherein one of the two recombination sites on thevector reacts with one of the sites on the 1^(st) nucleic acid segmentand another site on the vector reacts with a recombination site on then^(th) nucleic acid segment; and (c) conducting at least onerecombination reaction such that all of the nucleic acid fragments arerecombined into the vector. In specific embodiments, the recombinationreaction is conducted in the presence of one or more recombinationproteins under conditions which favor the recombination.

In specific embodiments of the methods described above, multiple nucleicacid segments are inserted into another nucleic acid molecules. Whilenumerous variations of such methods are possible, in specificembodiments, nucleic acid segments which contain recombination siteshaving different specificities (e.g., attL1 and attL2) are inserted intoa vector which contains more than one set of cognate recombination sites(e.g., attR1 and attR2), each set of which flanks negative selectionmarkers. Thus, recombination at cognate sites results can be used toselect for nucleic acid molecules which have undergone recombination atone or more of the recombination sites. The nucleic acid segments whichare inserted into the vector may be the same or different. Further,these nucleic acid segments may encode expression products or may betranscriptional control sequences. When the nucleic acid segments encodeexpression products, vectors of the invention may be used to amplify thecopy number or increase expression of encoded products. Further, whennucleic acid segments are inserted in both direct and invertedorientations, vectors of the invention may be used, for example, toexpress RNAi, as described elsewhere herein. When the nucleic acidsegments encode sequence which regulate transcription (e.g., promoters,enhancers, etc.), vectors of the invention may be used to place multipleregulatory elements in operable linkage with nucleic acid that encodesexpression products. Vectors of this nature may be used to increasedexpression of expression products, for example, by providing multiplebinding sites for proteins which activate transcription. Similarly,vectors of this nature may be used to decrease expression of expressionproducts, for example, by providing multiple binding sites for proteinswhich inhibit transcription. Vectors of this nature may be used toincreased or decrease the expression of expression products, forexample, by the expression of multiple copies of nucleic acid moleculeswhich encode factors involved in the regulation of transcription. Otherembodiments related to the above would be apparent to one skilled in theart.

In another aspect, the invention provides methods of cloning at leastone nucleic acid molecule comprising (a) providing a first population ofnucleic acid molecules wherein all or a portion of such molecules areflanked by at least a first and a second recombination sites; (b)providing at least one nucleic acid segment flanked by at least a thirdand a fourth recombination sites, wherein either the first or the secondrecombination site is capable of recombining with either the third orthe fourth recombination site; (c) conducting a recombination reactionsuch that all or a portion of the nucleic acid molecules in thepopulation are recombined with the segment to form a second populationof nucleic acid molecules; and (d) cloning the second population ofnucleic acid molecules. In related embodiments, the recombinationreaction is conducted in the presence of one or more recombinationproteins under conditions which favor the recombination. In specificembodiments, the second population of nucleic acid molecules encodes afusion protein. In related embodiments, the nucleic acid segment encodesa polypeptide which comprises a sequence (preferably an N-terminaland/or a C-terminal tag sequence) encoding all or a portion of thefollowing: the Fc portion of an immunoglobin, an antibody, aβ-glucuronidase, a fluorescent protein (e.g., green fluorescent protein,yellow fluorescent protein, red fluorescent protein, cyan fluorescentprotein, etc.), a transcription activation domain, a protein or domaininvolved in translation, protein localization tag, a proteinstabilization or destabalization sequence, a protein interactiondomains, a binding domain for DNA, a protein substrate, a purificationtag (e.g., an epitope tag, maltose binding protein, a six histidine tag,glutathione S-transferase, etc.), and an epitope tag.

In another aspect, the invention provides methods of cloning at leastone nucleic acid molecule comprising (a) providing a first population ofnucleic acid molecules wherein all or a portion of such molecules areflanked by at least a first and a second recombination site; (b)providing a second population of nucleic acid molecules wherein all or aportion of such molecules are flanked by a third and a fourthrecombination site, wherein either the first or the second recombinationsite is capable of recombining with either the third or the fourthrecombination site; (c) conducting a recombination reaction such thatall or a portion of the molecules in the first population is recombinedwith one or more molecules from the second population to form a thirdpopulation of nucleic acid molecules; and (d) cloning the thirdpopulation of nucleic acid molecules. In related embodiments, therecombination reaction is conducted in the presence of one or morerecombination proteins under conditions which favor the recombination.

Thus, the invention generally provides methods of joining at least twosegments of nucleic acid (including joining populations of nucleic acidmolecules), comprising (a) providing at least two segments of nucleicacid (one or both of which may be derived from a population or libraryof molecules), each segment comprising at least one recombination sitecapable of recombining with a recombination site present on another (orsecond) segment; and (b) contacting the segments with one or morerecombination proteins under conditions causing recombination betweenthe recombination sites, thereby joining the segments. The inventionfurther provides composition comprising the joined nucleic acid segments(or population of segments) prepared by such methods, hosts or hostcells comprising such joined nucleic acid segments (which may bepopulations of host cells or recombinant host cells), and methods ofmaking such hosts or host cells (such as by transforming or transfectingsuch cells with product molecules of the invention). In specificembodiments, methods of the invention further comprise inserting thejoined nucleic acid segments into one or more vectors. The inventionalso relates to hosts or host cells containing such vectors. Inadditional specific embodiments, at least one of the two segments ofnucleic acid encodes an expression product (e.g, a selectable marker, anenzyme, a ribozyme, etc.) having one or more identifiable activities. Inyet other specific embodiments, at least one of the two segments ofnucleic acid contains all or part of an open reading frame (ORF). Inanother aspect, at least one of the two segments of nucleic acidcontains a sequence which is capable of regulating transcription (e.g.,a promoter, an enhancer, a repressor, etc.). In a specific aspect, onesegment encodes an ORF and the other encodes a sequence capable ofregulating transcription and/or translation and the recombinationreaction allows such sequences to be operably linked. In yet otheradditional specific embodiments, one or more of the nucleic acidsegments encode a selectable marker or contains an origin ofreplication. In further specific embodiments, some or all of the nucleicacid segments comprise nucleic acid molecules of one or more libraries.In certain specific embodiments, the one or more libraries comprisepolynucleotides which encode variable domains of antibody molecules. Inrelated embodiments, at least one of the nucleic acid segments encodes apolypeptide linker for connecting variable domains of antibody moleculesand/or one or more libraries comprise polynucleotides which encodevariable domains of antibody light and heavy chains. In specificembodiments, methods of the invention further comprises at least onescreening step to identify nucleic acid molecules which encode proteinshaving one or more identifiable activities (e.g., binding specificitiesfor one or more antigens, enzymatic activities, activities associatedwith selectable markers, etc.). Thus, the invention can be used toproduce modified expression products (by variably linking differentsegments and/or replacing and/or deleting segments) and analyzing theexpression products for desired activities. According to the invention,portions of genes and/or a number of genes can be linked to expressnovel proteins or novel compounds and to select for activities ofinterest. As described herein, substitution and/or deletions of suchlinked molecules can also be used to produce altered or modifiedproteins or compounds for testing. In one aspect, biological pathwayscan be modified by the methods of the invention to, for example, usedifferent enzymes or mutant enzymes in a particular pathway (e.g., linkdifferent enzymes or mutant enzymes which participate in reactions inthe same biological pathway). Such modification to biological pathwaysaccording to the invention leads to (1) the production of potentiallynovel compounds such as antibiotics or carbohydrates or (2) uniquepost-translational modification of proteins (e.g., glycosylation,sialation, etc.). The invention also allows for production of novelenzymes by manipulating or changing subunits of multimeric enzymecomplexes. In other specific embodiments, the invention also providesmethods of altering properties of a cell comprising introducing into thecell nucleic acid segments produced by the methods described herein. Incertain specific embodiments, cells altered or produced by methods ofthe invention are either fungal cells or bacterial cells (e.g.,Escherichia coli).

The invention further provides methods for altering biological pathwaysand generating new biological pathways. For example, genes encodingproducts involved in the production of a particular pathway (e.g., apathway which leads to the production of an antibiotic) may be alteredusing methods of the invention. These alterations include the deletion,replacement, and/or mutation of one or more genes which encode productsthat participate in the pathway. In addition, regions of genes may bedeleted or exchanged following by screening to identify, for example,pathway products having particular features (e.g., a particularmethylation pattern). Further, genes of different organisms whichperform similar but different functions may be combined to produce novelproducts. Further, these products may be identified by screening forspecific functional properties (e.g., the ability to inhibit anenzymatic reaction, binding affinity for a particular ligand,antimicrobial activity, antiviral activity, etc.). Thus, the inventionprovides, in one aspect, screening methods for identifying compoundswhich are produced by expression products of nucleic acid molecules ofthe invention.

Further, when the nucleic acid segments which encode one or moreexpression products involved in a particular biological pathway orprocess have been assembled into one or more nucleic acid molecules,regions of these molecules (e.g., regions which encode expressionproducts) may be deleted or replaced to generate nucleic acid moleculeswhich, for example, express additional expression products, alteredexpression products, or which do not express one or more expressionproduct involved in the biological pathway or process. Further, nucleicacid segments which encode one or more expression products involved in aparticular biological pathway or process may be deleted or inserted as asingle unit. These methods find application in the production andscreening of novel products. In particular, the invention also includesnovel products produced by the expression products of nucleic acidmolecules described herein.

In another aspect, the invention provides methods for preparing andidentifying nucleic acid molecules containing two or more nucleic acidsegments which encode gene products involved in the same biologicalprocess or biological pathway, as well as unrelated biological processesor biological pathways, comprising (a) providing a first population ofnucleic acid molecules comprising at least one recombination sitecapable of recombining with other nucleic acid molecules in the firstpopulation; (b) contacting the nucleic acid molecules of the firstpopulation with one or more recombination proteins under conditionswhich cause the nucleic acid molecules to recombine and create a secondpopulation of nucleic acid molecules; and (c) screening the secondpopulation of nucleic acid molecules to identify a nucleic acid moleculewhich encodes two or more products involved in the same process orpathway. In specific embodiments of the invention, the nucleic acidmolecules which encodes two or more products involved in the sameprocess or pathway encode two different domains of a protein or proteincomplex. In other specific embodiments, the protein is a single-chainantigen-binding protein. In yet other specific embodiments, the proteincomplex comprises an antibody molecule or multivalent antigen-bindingprotein comprising at least two single-chain antigen-binding protein.The invention further provides methods similar to those described abovefor preparing and identifying nucleic acid molecules containing two ormore nucleic acid segments which encode gene products involved indifferent or unrelated biological processes or biological pathways.

Methods of the invention may also be employed to determine theexpression profile of genes in cells and/or tissues. In one embodiment,RNA may be obtained from cells and/or tissues and used to generate cDNAmolecules. These cDNA molecules may then be linked to each other andsequenced to identify genes which are expressed in cells and/or tissues,as well as the prevalence of RNA species in these cells and/or tissues.Thus, in one aspect, the invention provides methods for identifyinggenes expressed in particular cells and/or tissues and the relativequantity of particular RNA species present in these cells and/or tissuesas compared to the quantity of other RNA species. As discussed below,such methods may be used for a variety of applications includingdiagnostics, gene discovery, the identification of genes expressed inspecific cell and/or tissue types, the identification of genes which areover- or under-expressed in particular cells (e.g., cells associatedwith a pathological condition), the screening of agents to identifyagents (e.g., therapeutic agents) which alter gene expression, etc.Further, it will often be possible to identify the gene from which aparticular RNA species or segment is transcribed by comparison of thesequence data obtained by methods of the invention to nucleic acidsequences cataloged in public databases. Generally, about 10 nucleotidesor so of sequence data will be required to identify the gene from whichRNA has been transcribed.

Thus, in a specific aspect, the invention provides methods fordetermining gene expression profiles in cells or tissues comprising (a)generating at least one population of cDNA molecules from RNA obtainedfrom the cells or tissues, wherein the individual cDNA molecules of thepopulation comprise at least two recombination sites capable ofrecombining with at least one recombination site present on theindividual members of the same or a different population of cDNAmolecules; (b) contacting the nucleic acid molecules of (a) with one ormore recombination proteins under conditions which cause the nucleicacid molecules to join; and (c) determining the sequence of the joinednucleic acid molecules. In specific embodiments of the invention, thejoined cDNA molecules are inserted into vectors which contain sequencingprimer binding sites flanking the insertion sites. In yet other specificembodiments, the joined cDNA molecules are separated by attBrecombination sites. In additional specific embodiments, the joined cDNAmolecules contain between about 10 and about 30 nucleotides whichcorresponds to the RNA obtained from the cell or tissue.

Once the sequences of cDNA corresponding to RNA expression products havebeen determined, these sequences can be compared to databases whichcontain the sequences of known genes to determine which genes areexpressed in the particular cells and/or tissues and the expressionlevels of individual genes. Further, the expression levels of genes canbe determined using methods of the invention under particular conditionsto determine if these conditions result in the alteration of theexpression of one or more genes. Examples of such conditions includedecreased activity of cellular gene expression products, nutrientlimitation and/or deprivation, heat shock, low temperatures, contactwith solutions having low or high ionic strengths, exposure to chemicalagents (e.g., antibiotics, chemotherapeutic agents, metal ions,mutagens, etc.), ionizing radiation, etc. Thus, the invention providesmethods for identifying genes which exhibit alterations in expression asa result of specific stimuli.

The invention further provides methods for identifying genes involved incellular metabolism (e.g., pathological conditions). For example,methods of the invention can be used to determine the expression profileof cells of a particular strain or cells which exhibit an aberrantphenotype. The expression profile of cells of the particular strain orcells which exhibit the aberrant phenotype is compared to the expressionprofile of cells of another strain or cells which do not exhibit theaberrant phenotype, referred to herein as “reference cells.” Bycomparison of expression profiles of genes of cells of the particularstrain or cells which exhibit the aberrant phenotype to appropriatereference cells, expression characteristics of associated with thestrain or aberrant phenotype can be determined. Thus, in one specificaspect, the invention provides diagnostic methods, wherein the geneexpression profiles of cells of a patient which exhibit an aberrantphenotype (e.g., cancerous) is compared to the gene expression profilesof cells which do not exhibit the aberrant phenotype (i.e., referencecells).

In another specific aspect, the invention provides methods for screeningtherapeutic agents (e.g., immunostimulatory agent) comprising (a)exposing cells (e.g., human cells) to a candidate therapeutic agent, (b)determining the gene expression profile of the exposed cells, (c)comparing the gene expression profile to the gene expression profile ofcells which have not been exposed to the candidate therapeutic agent(i.e., reference cells). The invention further includes therapeuticagents identified by the methods described above.

In another aspect, the invention provides a means for attaching orbinding through recombination molecules and/or compounds or populationof molecules and/or compounds to other molecules, compounds and/orsupports (preferably solid or semisolid). Suitable molecules andcompounds for use in the present invention include, but are not limitedto, proteins, polypeptides, or peptides, chemical compounds, drugs,lipids, lipoproteins, carbohydrates, hormones, steroids, antibodies (orportions thereof), antigens, enzymes (e.g., nucleases, polymerases,etc.), polysaccharides, nucleosides and derivatives thereof, nucleotidesand derivatives thereof, amino acids and derivatives thereof, fattyacids, receptors, ligands, haptens, small molecules (e.g., activationgroups such as —COOH), binding molecules (e.g., biotin, avidin,strepavidin, Protein A, Protein B, etc.), growth factors, metal ions,cytokines, ribozymes, or nucleic acid molecules (e.g., RNA, DNA, DNA/RNAhybrids, cDNA or cDNA libraries, double stranded nucleic acids, singlestranded nucleic acids, linear nucleic acids, circular nucleic acids,supercoiled nucleic acids and the like) and combinations of two or moreof the foregoing. In specific embodiments, molecules may be linked tosupports either directly or indirectly. Further, molecules may be linkedto supports by either covalently or non-covalently. For purposes ofillustration, one example of the indirect non-covalent linkage of anucleic acid molecule to a support is where a protein which exhibitshigh binding affinity for nucleic acid molecules is directly linked to asupport. The support containing this protein is then contacted with thenucleic acid molecules under appropriate conditions resulting in thenon-covalent attachment of the nucleic acid molecules to the supportthrough the protein. This association between nucleic acidmolecule/protein interaction can be either sequence specific ornon-sequence specific.

In another aspect, the invention provides supports comprising (eitherbound or unbound to the support) at least one first nucleic acidmolecule, wherein the first nucleic acid molecule comprises one or morerecombination sites or portions thereof. In specific embodiments,supports of the invention further comprise at least one second nucleicacid molecule or at least one peptide or protein molecule or othercompound bound to the supports through the recombination site on thefirst nucleic acid molecule.

The invention also relates to supports of the invention which comprise(either bound or unbound to the support) one or more components selectedfrom the group consisting of one or more nucleic acid moleculescomprising at least one recombination site, one or more recombinationproteins, and one or more peptides or compounds comprising at least onerecombination site.

In another aspect, the invention provides methods for attaching orbinding one or more nucleic acid molecules, protein or peptidemolecules, or other compounds to supports comprising (a) obtaining atleast one nucleic acid molecule, protein or peptide molecule, othercompounds, or population of such molecules or compounds comprising atleast one recombination site and obtaining supports comprising at leastone recombination site; and (b) causing some or all of the recombinationsites on the at least one nucleic acid molecule, protein or peptidemolecule, other compounds, or population of such molecules or compoundsto recombine with all or a portion of the recombination sites comprisingthe supports. In specific embodiments of the invention, the methodsfurther comprise attaching or binding one or more nucleic acid moleculesto the supports. In other specific embodiments, only one nucleic acidmolecule is directly linked to the support. In yet other specificembodiments, the nucleic acid molecules form microarrays. In even morespecific embodiments, the microarrays form a DNA chip. The inventionfurther provides supports prepared by the methods described above. Inspecific embodiments, the support of the invention are either solid orsemisolid. Further, as discussed above, nucleic acid molecules may belinked to supports either directly or indirectly. As also discussedabove, nucleic acid molecules may be linked to supports eithercovalently or non-covalently. In addition, nucleic acid molecules may belinked to supports through linkage to a protein or small molecule (e.g.,a molecule having an activation group such as —COOH). Further, nucleicacid molecules may be linked to supports through linkages which areeither labile or non-labile.

In another aspect, the invention provides methods for linking orconnecting two or more molecules or compounds of interest, comprising(a) providing at least a first and a second molecule or compound ofinterest, each of the first and second molecules or compounds ofinterest comprising at least one recombination site; (b) causing some orall of the recombination sites on the first molecule or compound ofinterest to recombine with some or all of the recombination sites on thesecond molecule or compound of interest. In specific embodiments of theinvention, the methods further comprise attaching nucleic acidscomprising recombination sites to the first and the second molecules orcompounds of interest. In other specific embodiments, at least one ofthe molecules or compounds of interest comprises a protein or peptide, anucleic acid, a carbohydrate, a steroid, or a lipid.

In some embodiments, one or more of the compounds and/or molecules ofthe invention (e.g., two, three, four, five, seven, ten, twelve,fifteen, twenty, thirty, fifty, etc.) may comprise one or morerecombination sites (e.g., two, three, four, five, seven, ten, twelve,fifteen, twenty, thirty, fifty, etc.) or portions thereof. Suchmolecules and/or compounds may be unlabeled or detectably labeled bymethods well known in the art. Detectable labels include, but are notlimited to, radioactive labels, mass labels, fluorescent labels,chemiluminescent labels, bioluminescent labels, and enzyme labels. Useof such labels may allow for the detection of the presence or absence oflabeled molecules and/or compounds on a support. Thus, the inventiongenerally relates to attaching to a support any number of moleculesand/or compounds or populations of molecules and/or compounds byrecombination and the supports made by this method. Such compoundsand/or molecules can thus be attached to a support or structure via anucleic acid linker containing a recombination site or portion thereof.Such linkers are preferably small (e.g., 5, 20, 30, 50, 100, 200, 300,400, or 500 base pairs in length).

Accordingly, the present invention encompasses a support comprising oneor a number of recombination sites (or portions thereof) which can beused according to this aspect of the invention. Thus, one or a number ofnucleic acid molecules, or proteins, peptides and/or other moleculesand/or compounds having one or more recombination sites or portionsthereof which are to be added or attached or bound to the support arerecombined by a recombination reaction with therecombination-site-containing support, thereby creating a supportcontaining one or more nucleic acid molecules, or protein, peptidesand/or other molecules and/or compounds of interest. The recombinationreaction in binding the molecule and/or compound of interest to thesupport is preferably accomplished in vitro by contacting the supportand the molecule and/or compound of interest with at least onerecombination protein under conditions sufficient to cause recombinationof at least one recombination site on the molecule and/or compound ofinterest with at least one recombination site present on the support.This aspect of the invention is particularly useful in creating arraysof nucleic acids, or proteins and/or other molecules and/or compounds onone or more supports (e.g., two, three, four, five, seven, ten, twelve,etc.) in that it facilitates binding of a number of the same ordifferent nucleic acids, or proteins and/or other molecules and/orcompounds of interest through recombination to the support or variousparts of the support. Thus, the invention relates to a method ofattaching or binding one or more (e.g., two, three, four, five, seven,ten, twelve, fifteen, twenty, thirty, fifty, etc.) nucleic acid, orprotein molecules and/or other molecules and/or compounds to a supportcomprising:

(a) obtaining at least a first molecule and/or compound or population ofmolecules and/or compounds comprising at least one recombination site(e.g., the starting nucleic acid molecules of the invention) andobtaining a support comprising at least one recombination site (whichmay also be the starting molecules of the invention); and

(b) causing some or all of the recombination sites on said at leastfirst molecule and/or compound or population of molecules and/orcompounds to recombine with all or a portion of the recombination siteson the support.

Once the molecules and/or compounds are added to the support, thepresence or absence or position of such molecules and/or compounds onthe support can be determined (for example by using detectable labels).Additionally, the molecules and/or compounds bound to the support may befurther manipulated by well known techniques.

In addition to joining one or multiple molecules and/or compounds to asupport in accordance with the invention, the invention also allowsreplacement, insertion, or deletion of one or more molecules and/orcompounds contained by the support. As discussed herein, causingrecombination of specific sites within a molecule and/or compound ofinterest, all or a portion of molecule and/or compound may be removed orreplaced with another molecule or compound of interest. This process mayalso be applied to molecules and/or compounds having recombination sitewhich are attached to the support. Thus, recombination may be used toremove or replace all or a portion of the molecule and/or compound ofthe interest from the support, in addition to adding all or part ofmolecules to supports.

The molecules and/or compounds added to the support or removed from thesupport may be further manipulated or analyzed in accordance with theinvention and as described herein. For example, further analysis ormanipulation of molecules and/or compounds bound to or removed from thesupport include sequencing, hybridization (DNA, RNA etc.),amplification, nucleic acid synthesis, protein or peptide expression,protein-DNA interactions (2-hybrid or reverse 2-hybrid analysis),interaction or binding studies with other molecules and/or compounds,homologous recombination or gene targeting, and combinatorial libraryanalysis and manipulation. Such manipulation may be accomplished whilethe molecules and/or compounds are bound to the support or after themolecules and/or compounds are removed from the support.

In accordance with the invention, any solid or semi-solid supports maybe used and sequences containing recombination sites (or portionsthereof) may be added by well known techniques for attaching nucleicacids to supports. Furthermore, recombination sites may be added tonucleic acid, protein molecules and/or other molecules and/or compoundsof interest by techniques well known in the art. Moreover, any wild-typeor mutant recombination sites or combinations of the same or differentrecombination sites may be used for adding and removing molecules and/orcompounds of interest to or from a support.

The invention also relates to any support comprising one or morerecombination sites (e.g., two, three, four, five, seven, ten, twelve,fifteen, twenty, thirty, fifty, etc.) or portions thereof and tosupports comprising nucleic acid, protein molecules and/or othermolecules and/or compounds having one or more recombination sites (orportions thereof) bound to said support.

The invention also relates to compositions comprising such supports ofthe invention. Such compositions may further comprise one or morerecombination proteins (preferably site specific recombinationproteins), suitable buffers (e.g., for causing recombination), nucleicacid, protein molecules and/or other molecules and/or compounds,preferably comprising recombination sites which may be unbound to thesupport, and any other reagents used for recombining recombination sitesaccording to the invention (and combinations thereof). The inventionalso relates to compositions for use in further manipulating oranalyzing the supports of the invention or the nucleic acid or proteinmolecules or other molecules and/or compounds attached thereto. Furthermanipulation and analysis may be preformed on the nucleic acids,proteins, and/or other molecules and/or compounds while bound to thesupport or after removal from the support. Such compositions maycomprise suitable buffers and enzymes such as restriction enzymes,polymerases, ligases, recombination proteins, and the like.

In another aspect, the present invention provides a means for attachingor binding one or more (e.g., two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.) molecules and/or compoundsor populations of molecules and/or compounds to one or more of the sameor different molecules and/or compounds or populations of moleculesand/or compounds. Thus, the invention generally relates to connectingany number of molecules and/or compounds or population of moleculesand/or compounds by recombination. As described herein, such linkedmolecules and/or compounds may be unlabeled or detectably labeled.Further, such linked molecules and/or compounds may be linked to eithercovalently or non-covalently. Suitable molecules and/or compoundsinclude, but are not limited to, those described herein such as nucleicacids, proteins or peptides, chemical compounds, drugs, lipids,lipoproteins, hormones, etc. In one aspect, the same molecules and/orcompounds, or the same type of molecules and/or compounds (e.g.,protein-protein, nucleic acid-nucleic acid, etc.) may be linked throughrecombination. Thus, in one aspect, small molecules and/or proteins maybe linked to recombination sites and then linked to each other invarious combinations.

In another aspect, different molecules and/or compounds or differenttypes of molecules and/or compounds (e.g., protein-nucleic acid, nucleicacid-ligand, protein-ligand, etc.) may be linked through recombination.Additionally, the molecules and/or compounds linked throughrecombination (e.g., protein-protein, protein-ligand, etc.) may beattached to a support or structure through recombination as describedherein. Thus, the molecules and/or compounds (optionally linked to asupport) produced are linked by one or more recombination sites (orportions thereof). Such recombination sites (or portions thereof) may beattached to molecules such as proteins, peptides, carbohydrates,steroids and/or lipids or combinations thereof using conventionaltechnologies and the resulting recombination-site-containing moleculesand/or compounds may be linked using the methods of the presentinvention. Further, the resultant linked molecules and/or compounds maybe attached via one or more of the recombination sites to othermolecules and/or compounds comprising recombination sites. For example,a nucleic acid comprising a recombination site may be attached to amolecule of interest and a second nucleic acid comprising a compatiblerecombination site may be attached to a second molecule of interest.Recombination between the sites results in the attachment of the twomolecules via a small nucleic acid linker. The nucleic acid linker maybe any length depending on the need but preferably is small (e.g., fromabout 5 to about 500 bps in length). Using this methodology, proteins,peptides, nucleic acids, carbohydrates, steroids and/or lipids orcombinations thereof may be attached to proteins, peptides, nucleicacids, carbohydrates, steroids and/or lipids or combinations thereof.Thus, the present invention provides a method of connecting two or moremolecules and/or compounds, comprising the steps of:

(a) obtaining at least a first and a second molecule and/or compound,each of said molecules and/or compounds comprising at least onerecombination site (or portion thereof); and

(b) causing some or all of the recombination sites (or portions thereof)on said first molecule and/or compound to recombine with all or aportion of the recombination sites (or portions thereof) on said secondmolecule and/or compound.

In some preferred embodiments, a recombination site may be attached to amolecule of interest using conventional conjugation technology. Forexample, oligonucleotides comprising the recombination site can besynthesized so as to include one or more reactive functional moieties(e.g., two, three, four, five, seven, ten, etc.) which may be the sameor different. Suitable reactive functional moieties include, but are notlimited to, amine groups, epoxy groups, vinyl groups, thiol groups andthe like. The synthesis of oligonucleotides comprising one or morereactive functional moieties is routine in the art. Once synthesized,oligonucleotides comprising one or more reactive functional moieties maybe attached to one or more reactive groups (e.g., two, three, four,five, seven, ten, etc.) present on the molecule or compound of interest.The oligonucleotides may be attached directly by reacting one or more ofthe reactive functional moieties with one or more of the reactivefunctional groups. In some embodiments, the attachment may be effectedusing a suitable linking group capable of reacting with one or more ofthe reactive functional moieties present on the oligonucleotide and withone or more of the reactive groups present on the molecule of interest.In other embodiments, both direct attachment and attachment through alinking group may be used. Those skilled in the art will appreciate thatthe reactive functional moieties on the oligonucleotide may be the sameor different as the reactive functional moieties on the molecules and/orcompounds of interest. Suitable reagents and techniques for conjugationof the oligonucleotide to the molecule of interest may be found inHermanson, Bioconjugate Techniques, Academic Press Inc., San Diego,Calif., 1996.

The present invention also relates to kits for carrying out the methodsof the invention, and particularly for use in creating the productnucleic acid molecules of the invention or other linked molecules and/orcompounds of the invention (e.g., protein-protein, nucleic acid-protein,etc.), or supports comprising such product nucleic acid molecules orlinked molecules and/or compounds. The invention also relates to kitsfor adding and/or removing and/or replacing nucleic acids, proteinsand/or other molecules and/or compounds to or from one or more supports,for creating and using combinatorial libraries of the invention, and forcarrying out homologous recombination (particularly gene targeting)according to the methods of the invention. The kits of the invention mayalso comprise further components for further manipulating therecombination site-containing molecules and/or compounds produced by themethods of the invention. The kits of the invention may comprise one ormore nucleic acid molecules of the invention (particularly startingmolecules comprising one or more recombination sites and optionallycomprising one or more reactive functional moieties), one or moremolecules and/or compounds of the invention, one or more supports of theinvention and/or one or more vectors of the invention. Such kits mayoptionally comprise one or more additional components selected from thegroup consisting of one or more host cells (e.g., two, three, four, fiveetc.), one or more reagents for introducing (e.g., by transfection ortransformation) molecules or compounds into one or more host cells, oneor more nucleotides, one or more polymerases and/or reversetranscriptases (e.g., two, three, four, five, etc.), one or moresuitable buffers (e.g., two, three, four, five, etc.), one or moreprimers (e.g., two, three, four, five, seven, ten, twelve, fifteen,twenty, thirty, fifty, etc.), one or more terminating agents (e.g., two,three, four, five, seven, ten, etc.), one or more populations ofmolecules for creating combinatorial libraries (e.g., two, three, four,five, seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.) and oneor more combinatorial libraries (e.g., two, three, four, five, seven,ten, twelve, fifteen, twenty, thirty, fifty, etc.). The kits of theinvention may also contain directions or protocols for carrying out themethods of the invention.

In another aspect the invention provides kits for joining, deleting, orreplacing nucleic acid segments, these kits comprising at least onecomponent selected from the group consisting of (1) one or morerecombination proteins or compositions comprising one or morerecombination proteins, and (2) at least one nucleic acid moleculecomprising one or more recombination sites (preferably a vector havingat least two different recombination specificities). The kits of theinvention may also comprise one or more components selected from thegroup consisting of (a) additional nucleic acid molecules comprisingadditional recombination sites; (b) one or more enzymes having ligaseactivity; (c) one or more enzymes having polymerase activity; (d) one ormore enzymes having reverse transcriptase activity; (e) one or moreenzymes having restriction endonuclease activity; (f) one or moreprimers; (g) one or more nucleic acid libraries; (h) one or moresupports; (i) one or more buffers; (O) one or more detergents orsolutions containing detergents; (k) one or more nucleotides; (1) one ormore terminating agents; (m) one or more transfection reagents; (n) oneor more host cells; and (O) instructions for using the kit components.

Further, kits of the invention may contain one or more recombinationproteins selected from the group consisting of Cre, Int, IHF, X is, Flp,F is, Hin, Gin, Cin, Tn3 resolvase, ΦC31, TndX, XerC, and XerD.

In addition, recombination sites of kits of the invention will generallyhave different recombination specificities each comprising att siteswith different seven base pair overlap regions. In specific embodimentsof the invention, the first three nucleotides of these seven base pairoverlap regions comprise nucleotide sequences selected from the groupconsisting of AAA, AAC, AAG, AAT, ACA, ACC, ACG, ACT, AGA, AGC, AGG,AGT, ATA, ATC, ATG; ATT, CAA, CAC, CAG, CAT, CCA, CCC, CCG, CCT, CGA,CGC, CGG, CGT, CTA, CTC, CTG CTT, GAA, GAC, GAG, GAT, GCA, GCC, GCG,GCT, GGA, GGC, GGG, GGT, GTA, GTC, GTG, GTT, TAA, TAC, TAG, TAT, TCA,TCC, TCG, TCT, TGA, TGC, TGG, TGT, TTA, TTC, TTG, and TTT.

In specific embodiments, kits of the invention contain compositionscomprising one or more recombination proteins capable of catalyzingrecombination between att sites. In related embodiments, thesecompositions comprise one or more recombination proteins capable ofcatalyzing attB×attP (BP) reactions, attL×attR (LR) reactions, or bothBP and LR reactions.

Nucleic acid libraries supplied with kits of the invention may comprisecDNA or genomic DNA. Further, these libraries may comprisepolynucleotides which encode variable domains of antibody light andheavy chains. 1

The invention also relates to compositions for carrying out the methodsof the invention and to compositions created while carrying out themethods of the invention. In particular, the invention includes nucleicacid molecules prepared by methods of the invention, methods forpreparing host cells which contain these nucleic acid molecules, hostcells prepared by these methods, and methods employing these host cellsfor producing products (e.g., RNA, protein, etc.) encoded by thesenucleic acid molecules, products encoded by these nucleic acid molecules(e.g., RNA, protein, etc.).

The compositions, methods and kits of the invention are preferablyprepared and carried out using a phage-lambda site-specificrecombination system and more preferably with the GATEWAY™Recombinational Cloning System available from Invitrogen Corp.(Carlsbad, Calif.). The GATEWAY™ Cloning Technology Instruction Manual(Invitrogen Corp.) describes in more detail the systems and isincorporated herein by reference in its entirety.

Other preferred embodiments of the invention-will be apparent to one orordinary skill in the art in light of what is known in the art, in lightof the following drawings and description of the invention, and in lightof the claims.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a schematic representation of the basic recombinationalcloning reaction.

FIG. 2 is a schematic representation of the use of the present inventionto clone two nucleic acid segments by performing an LR recombinationreaction.

FIG. 3 is a schematic representation of the use of the present inventionto clone two nucleic acid segments by joining the segments using an LRreaction and then inserting the joined fragments into a DestinationVector using a BP recombination reaction.

FIG. 4 is a schematic representation of the use of the present inventionto clone two nucleic acid segments by performing a BP reaction followedby an LR reaction.

FIG. 5 is a schematic representation of two nucleic acid segments havingattB sites being cloned by performing a first BP reaction to generate anattL site on one segment and an attR on the other followed by an LRreaction to combine the segments. In variations of this process, P1, P2,and/or P3 can be oligonucleotides or linear stretches of nucleotides.

FIG. 6 is a schematic representation of the cloning of two nucleic acidsegments into two separate sites in a Destination Vector using an LRreaction.

FIG. 7 is a schematic representation of the cloning of two nucleic acidsegments into two separate sites in a vector using a BP reaction.

FIG. 8 is a schematic representation of the cloning of three nucleicacid segments into three vectors using BP reactions, cloning the threesegments into a single vector using an LR reaction, and generatingsegments separated by attB sites.

FIG. 9 is a schematic representation of the cloning of three nucleicacid segments into a single vector using a BP reaction and generatingsegments separated by attR sites.

FIG. 10 is a schematic representation of adding one or more of the sameor different molecules (nucleic acid, protein/peptide, carbohydrate,and/or other compounds) to a support (shaded box) by recombination. Theopen boxes represent recombination sites.

FIG. 11 is a schematic representation of joining multiple moleculesand/or compounds (A and B). Labels used in this figure correspond tothose in FIG. 10. The addition of A and B can be simultaneous orsequential.

FIG. 12 is a schematic representation of deleting a portion of amolecule or compound (A) from a support. Labels used in this figurecorrespond to those in FIG. 10.

FIG. 13 is a schematic representation of replacing a portion of amolecule or compound (A) with a second molecule or compound (C). Labelsused in this figure correspond to those in FIG. 10.

FIG. 14A is a plasmid map showing a construct for providing a C-terminalfusion to a gene of interest. SupF encodes a suppressor function. Thus,when supF is expressed, a GUS-GST fusion protein is produced. Invariations of this molecules, GUS can be any gene.

FIG. 14B is a schematic representation of method for controlling bothgene suppression and expression. The T7 RNA polymerase gene contains oneor more (two are shown) amber stop codons (labeled “am”) in place oftyrosine codons. Leaky (uninduced) transcription from the induciblepromoter makes insufficient supF to result in the production of activeT7 RNA polymerase. Upon induction, sufficient supF is produced to makeactive T7 RNA polymerase, which results in increased expression of supF,which results in further increased expression of T7 RNA polymerase. TheT7 RNA polymerase further induces expression of Gene. Further,expression of supF results in the addition of a C-terminal tag to theGene expression product by suppression of the intervening amber stopcodon.

FIG. 15 is a plasmid map showing a construct for the production of N-and/or C-terminal fusions of a gene of interest. Circled numbersrepresent amber, ochre, or opal stop codons. Suppression of these stopcodons result in expression of fusion tags on the N-terminus, theC-terminus, or both termini. In the absence of suppression, nativeprotein is produced.

FIG. 16 is a schematic representation of the single step insertion offour separate DNA segments into a Destination Vector using LR reactions.In particular, a first DNA segment having an attL1 site at the 5′ endand an attL3 site at the 3′ end is linked to a second DNA segment havingan attR3 site at the 5′ end and an attL4 site at the 3′ end. The secondDNA segment is then linked to a third DNA segment having an attR4 siteat the 5′ end and an attL5 site at the 3′ end. The third DNA segment isthen linked to a fourth DNA segment having an attR5 site at the 5′ endand an attL2 site at the 3′ end. Thus, upon reaction with LR CLONASE™,the first, second, third, and fourth DNA segments are inserted into aDestination Vector which contains a ccdB gene flanked by attR1 and attR2sites. The inserted DNA segments are separated from each other andvector sequences by attB1, attB3, attB4, attB5, and attB2 sites.

FIGS. 17A and 17B show schematic representations of the construction ofa lux operon prepared according to the methods set out below in Example18. In accordance with the invention, one or more genes of the operoncan be replaced or deleted through recombination to construct one ormore modified operons and then tested for activity and/or effect on hostcells. Alternatively, other genes (including variants and mutants) canbe used in the initial construction of the operon to replace one or moregenes of interest, thereby producing one or more modified operons.

FIG. 18 is a schematic representation of the insertion of six separateDNA segments into a vector using a two step, one vector process. Inparticular, a first DNA segment (DNA-A) having an attL1 site at the 5′end and an attL3 site at the 3′ end is linked to a second DNA segment(DNA-B) having an attR3 site at the 5′ end and an attL4 site at the 3′end. The second DNA segment is then linked to a third DNA segment(DNA-C) having an attR4 site at the 5′ end and an attL5 site at the 3′end. A fourth DNA segment (DNA-D) having an attR1 site at the 5′ end andan attL3 site at the 3′ end is linked to a fifth DNA segment (DNA-E)having an attR3 site at the 5′ end and an attL4 site at the 3′ end. Thefifth DNA segment is then linked to a sixth DNA segment (DNA-F) havingan attR4 site at the 5′ end and an attL2 site at the 3′ end. The tworesulting molecules (i.e., DNA-A-DNA-B-DNA-C and DNA-D-DNA-E-DNA-F) arethen inserted into the insertion vector. Each of the above reactions iscatalyzed by LR CLONASE™. An LR reaction is also used to insert thejoined DNA segments into a Destination Vector which contains a ccdB geneflanked by attR1 and attR2 sites. The inserted DNA segments areseparated from each other and the vector by attB1, attB3, attB4, attB5,and attB2 sites. As described in FIG. 6, for example, the assembledsegments may be inserted into contiguous or non-contiguous sites.

FIG. 19 is a schematic representation of the insertion of six separateDNA segments into a vector using a two step, two vector process. Inparticular, a first DNA segment (DNA-A) having an attB1 site at the 5′end and an attL3 site at the 3′ end is linked to a second DNA segment(DNA-B) having an attR3 site at the 5′ end and an attL4 site at the 3′end. The second DNA segment is then linked to a third DNA segment(DNA-C) having an attR4 site at the 5′ end and an attB5 site at the 3′end. The linked DNA segments are then inserted into a vector whichcontains attP1 and attP5 sites. Further, a fourth DNA segment (DNA-D)having an attB5 site at the 5′ end and an attL3 site at the 3′ end islinked to a fifth DNA segment (DNA-E) having an attR3 site at the 5′ endand an attL4 site at the 3′ end. The fifth DNA segment is then linked toa sixth DNA segment (DNA-F) having an attR4 site at the 5′ end and anattB2 site at the 3′ end. The linked DNA segments are then inserted intoa vector which contains attP 1 and attP2 sites.

After construction of the two plasmids as described, each of whichcontains three inserted DNA segments, these plasmids are reacted with LRCLONASE™ to generate another plasmid which contains the six DNA segmentsflanked by attB sites (i.e.,B1-DNA-A-B3-DNA-B-B4-DNA-C-B5-DNA-D-1-B3-B1-DNA-E-B4-DNA-F-B2).

FIG. 20A is a schematic representation of an exemplary vector of theinvention which contains two different DNA inserts, the transcription ofwhich is driven in different directions by T7 promoters. Depending onthe type of transcripts which are to be produced, either of DNA-A and/orDNA-B may be in an orientation which results in the production of eithersense or anti-sense RNA.

FIG. 20B is a schematic representation of an exemplary vector of theinvention which contains one DNA insert, the transcription of which isdriven in two different directions by T7 promoters. Thus, RNA producedby transcription driven by one promoter will be sense RNA and RNAproduced by transcription driven by the other promoter will beanti-sense RNA.

FIG. 20C is a schematic representation of an exemplary vector of theinvention which contains two different DNA inserts having the samenucleotide sequence (i.e., DNA-A), the transcription of which are drivenin different directions by two separate T7 promoters. In this example,RNA produced by transcription driven by one promoter will be sense RNAand RNA produced by transcription driven by the other promoter will beanti-sense RNA.

FIG. 20D is a schematic representation of an exemplary vector of theinvention which contains two DNA inserts having the same nucleotidesequence (i.e., DNA-A) in opposite orientations, the transcription ofwhich is driven by one T7 promoter. A transcription termination signalis not present between the two copies of DNA-A and the DNA-A inserts.Transcription of one segment produces a sense RNA and of the otherproduces an anti-sense RNA. The RNA produced from this vector willundergo intramolecular hybridization and, thus, will form adouble-stranded molecule with a hairpin turn.

FIGS. 20E and 20F are schematic representations of exemplary vectors ofthe invention, each of which contains a DNA insert having the samenucleotide sequence (i.e., DNA-A). Transcription of these insertsresults in the production of sense and anti-sense RNA which may thenhybridize to form double stranded RNA molecules.

FIG. 21A is a schematic representation of an exemplary vector of theinvention which contains three inserts, labeled “promoter,” “codingsequence,” and “Kan^(r).” In this example, the inserted promoter drivesexpression of the coding sequence. Further, an inserted DNA segmentconfers resistance to kanamycin upon host cells which contain thevector. As discussed below in more detail, a considerable number ofvector components (e.g., a selectable marker (for example a kanamycinresistance gene) cassette, an on cassette, a promoter cassette, a tagsequence cassette, and the like) can be inserted into or used toconstruct vectors of the invention.

FIG. 21B is a schematic representation of an exemplary vector of theinvention which contains four inserts, labeled “promoter 1,” “codingsequence 1,” “promoter 2,” and “coding sequence 2.” In this example,promoter 1 drives expression of coding sequence 1 and promoter 2 drivesexpression of coding sequence 2.

FIG. 21C is a schematic representation of an exemplary vector of theinvention for homologous recombination. This vector which contains fourinserts, labeled “5′ homology,” “NEO,” “DNA-A,” and “3′ homology.” The5′ and 3′ homology regions, in this example, are homologous to achromosomal region selected for insertion of a neomycin resistancemarker (“NEO”) and a DNA segment (“DNA-A”). Targeting vectors of thistype can be designed to insert, delete and/or replace nucleic acidpresent in targeted nucleic acid molecules.

FIGS. 22A and 22B show a schematic representation of processes forpreparing targeting vectors of the invention.

FIG. 23 shows mRNA amplified with random-primed first strand reversetranscription, then random-primed with PCR. These amplification productsare split into n pools, and each pool is amplified with random primerswith a different pair of attB sites. The “R” suffix shows that some ofthe attB sites can be in inverted orientation. attB sites with eitherthe standard or reverse orientations are used in separate pools togenerate amplification products where the attB sites are linked ineither standard or inverted orientation. When these sites react withinverted attP sites, attR sites are formed in the Entry Clones insteadof attL sites. Hence, reacting pools with standard or inverted attR5will generate mixtures of molecules flanked by attR and attL sites. Theamplification products are sized by gel purification, then cloned withthe GATEWAY™ BP reaction to make Entry Clones, each containing smallinserts planked by attL sites, attR sites, or attL and attR, dependingon the orientation of the attB sites and attP sites used. When EntryClones are mixed together, the inserts clone form a concatamer that canbe cloned into a suitable Destination Vector, to give n inserts, eachseparated by an attB site. Sequencing a number of concatamers generatesa profile of mRNA molecules present in the original sample.

FIGS. 24A-24C show the sequences of a number of att sites (SEQ IDNOs:1-36) suitable for use in methods and compositions of the invention.

FIGS. 25A-25B show a collection of Entry Clones which contain insertsincluding, N-terminal tags or sequences (N-tag), open reading frames(ORF), C-terminal tags or sequences (C-tag), selectable markers (amp),origins of plasmid replication (ori) and other vector elements (forexample a loxP site). Each Entry Clone vector element insert is flankedby attL or attR sites such that the vector elements can be linkedtogether and form a new vector construct in an LR Clonase reaction(shown in FIG. 25B).

FIG. 26A-26B show a process for constructing attP DONOR plasmidscontaining attP sites of any orientation and specificity. FIG. 26A showsfour arrangements of attP sites in attP DONOR plasmids consisting of twoorientations of direct repeat and two orientations of inverted repeatattP sites. The four attP DONOR plasmids shown in FIG. 26A can be usedas templates for PCR reactions with PCR primers that would annealspecifically to the core of an attP site and thus create an attL or attRsite of any desired specificity at the ends of the PCR products. Foreach new attP DONOR vector to be constructed, two such PCR products aregenerated, one consisting of the plasmid backbone (ori-kan) and a secondconsisting of the ccdB and cat genes. The PCR products are reactedtogether in LR Clonase reactions to generate new plasmids with attPsites of any orientation with any att site specificity.

FIG. 27A shows a process for linking two nucleic acid segments, A and B.The segments are cloned in two similarly configured plasmids. Eachsegment is flanked by two recombination sites. One of the recombinationsites on each plasmid is capable of reacting with its cognate partner onthe other plasmid, whereas the other two recombination sites do notreact with any other site present. Each plasmid carries a unique originof replication which may or may not be conditional. Each plasmid alsocarries both positive and negative selectable markers (+smX and smY,respectively) to enable selection against, and for elements linked to aparticular marker. Lastly, each plasmid carries a third recombinationsite (loxP in this example), suitably positioned to enable deletion ofundesired elements and retention of desired elements. In this example,the two plasmids are initially fused at L2 and R2 via a Gateway L×Rreaction. This results in the juxtaposition of segments A and B via a B2recombination site, and the juxtapostion of sm1 and oriB via a P2recombination site. The two loxP sites in the backbone that flank aseries of plasmid elements are depicted in the second panel. Addition ofthe Cre protein will resolve the single large plasmid into two smallerones. One of these will be the desired plasmid which carries the linkedA and B segments with oriA now linked to sm2 and +sm4. The other carriesa set of dispensable and/or undesirable elements. Transformation of anappropriate host and subsequent imposition of appropriate geneticselections will result in loss of the undesired plasmid, while thedesired plasmid is maintained.

FIG. 27B shows a process for linking two chimeric nucleic acid segments,A-B and C-D, constructed as shown above in FIG. 27A. The segments arecloned in two similarly configured plasmids. Each segment is flanked bytwo recombination sites. One of these on each plasmid is capable ofreacting with its cognate partner on the other plasmid, whereas theother two recombination sites do not react with any other site present.In this example, the two plasmids are initially fused at L2 and R2 via aGateway L×R reaction. This results in the juxtaposition of segments Aand B via a B2 recombination site, and the juxtapostion of sm1 and oriBvia a P2 recombination site. The two loxP sites in the backbone thatflank a series of plasmid elements are depicted in the second panel.Addition of the Cre protein will resolve the single large plasmid intotwo smaller ones. One of these will be the desired plasmid which carriesthe linked A-B and C-D segments with oriA now linked to sm2 and +sm4.The other carries a set of dispensable and/or undesirable elements.Transformation of an appropriate host and subsequent imposition ofappropriate genetic selections will result in loss of the undesiredplasmid, whilst the desired plasmid is maintained.

FIGS. 28A-B show the sequence of pDEST™R4-R3 (SEQ ID NO:156).

FIGS. 29A-B show the sequence of pDONR™221 (SEQ ID NO:157).

FIGS. 30A-B show the sequence of pDONR™P2R-P3 (SEQ ID NO: 158).

FIGS. 31A-B show the sequence of pDONR™P4-P1R (SEQ ID NO:159).

FIGS. 32A-C show the sequence of pMS/GW (SEQ ID NO:160).

FIG. 33 shows vectors of a Two Fragment Modular Vector Construction Kitof the invention, as well as a recombination process using thesevectors. This kit may be used to link DNA elements to the 5′ end ofnucleic acid molecules comprising a recombination site (e.g.,Gateway-adapted ORFs). The Entry clones of 5′ elements and ORFs arelinked and assembled on the destination vector pDEST-R4R2 in a single LRreaction. The unique specificities of the different att sites allow fordirectional assembly of the Entry fragments.

FIG. 34 shows vectors of a Three Fragment Modular Vector ConstructionKit of the invention, as well as a recombination process using thesevectors. This kit allows DNA elements to be linked to the 5′ and 3′ endsof nucleic acid molecules comprising recombination sites (e.g.,Gateway-adapted ORFs). 5′ and 3′ elements are linked and assembled onthe destination vector pDEST-R4R3 in a single LR reaction. The 5′ and 3′elements are supplied to the LR reaction as Entry clones.

FIG. 35 pEXP-AI-ssGUS was constructed using the entry clones pENTR AIand pENTR ssGUS in an LR Clonase reaction with the destination vectorpDEST R4R2. Bacterial colonies transformed with either Entry clonesalone or the Destination vector used in the assembly of pEXP-AI-ssGUSalone were determined to be negative for Gus activity within the assayparameters. (AI promoter: arabinose inducible promoter; ssGUS:Glucoronidase gene with a Shine-Delgamo sequence and a translation stopcodon).

FIG. 36 Bsr GI digestion of six pExp-AI-ssGUS Expression clones. Thepredicted fragments from this digestion are 3670 bp, 1167 bp, 426 bp and279 bp. Lanes 2 and 9 are 1 kb-plus-DNA markers. Lanes 3 to 8 are Bsr GIdigested mini-prep DNA. A 1.2% E-Gel was used for the separation of thedigested fragments.

FIG. 37 pExp-AI-ssGUS-ss αlacZ19, a polycistronic expression clone, wasassembled with the Entry clones pENTR AI, pENTR ssGUS and pENTR ssαlacZ19 in a single LR reaction with the Destination vector pDEST R4R3.ss alacZ19: alpha lacZ fragment from puC19 with a Shine Delgamo and atranslation stop codon.

FIG. 38 Bsr GI digestion of six pExp-AI-ssGUS-ss alacZ19 Expressionclones. The predicted fragments from this digestion are 4026 bp, 1167bp, 426 bp and 279 bp. Lanes 2 and 9 are 1 kb-plus-DNA markers. Lanes 3to 8 are Bsr GI digested mini-prep plasmid DNA. All samples showed thedesired profile. The extra fragment in lane 8 was later proven to be apartial digest fragment.

FIG. 39 Effects of spermidine concentration on the linking of threeEntry clones in an LR reaction. Transformants from this reaction werescored against the final spermidine concentration. Several titrationexperiments were conducted however only one is depicted in the graph.All the experiments suggested a peak activity of between 7 to 10 mMspermidine but due to the variability of the colony count assaycompiling all results onto one graph was not feasible. The finalconcentration of spermidine in many Gateway LR reactions may be about4.5 mM.

FIG. 40 The effects of spermidine concentration between 7.5 and 10 mM inMultiSite LR reactions. Results from two separate experiments aredepicted in the graph.

FIG. 41A is a schematic diagram of vector pDONR5′.

FIG. 41B is a schematic diagram of vector pDONR3′. In particularembodiments, a spectinomycin resistance marker may be present instead orin addition to the chloramphenicol resistance marker shown in thisfigure (abbreviated “cmr”).

FIG. 41C is a schematic diagram of vector pDESTR4R2.

FIG. 41D is a schematic diagram of vector pDESTR4R3.

FIG. 41E is a schematic diagram of vector pMVC Control.

FIG. 42A depicts a BP reaction, where recombination of an attB substrate(e.g. attB PCR product or expression clone) with an attP substrate(donor vector) creates an attL-containing entry clone.

FIG. 42B depicts an LR reaction, where recombination of anattL-containing entry clone with an attR-containing destination vectorcreates an attB-containing expression clone.

FIG. 43 diagram showing three entry clones in a single MultiSite GatewayLR recombination reaction with a specially designed desination vector,pDEST™R4-R3

FIG. 44 depicts the recombination reaction between an attB4 andattB1-flanked PCR product and pDONR™P4-P1R to create an entry clone anda by-product (SEQ ID NO:161, SEQ ID NO:162, SEQ ID NO:163, SEQ IDNO:164).

FIG. 45 depicts the recombination reaction between an attB1 andattB2-flanked PCR product and pDONR™221 to create an entry clone and aby-product (SEQ ID NO:165, SEQ ID NO:166, SEQ ID NO:167, SEQ ID NO:168).

FIG. 46 depicts the recombination reaction between an attB2 andattB3-flanked PCR product and pDONR™P2R-P3 to create and entry clone anda by-product (SEQ ID NO:169, SEQ ID NO:170, SEQ ID NO:171, SEQ IDNO:172).

FIG. 47A depicts the generation of an att14 and attR1-flanked entryclone containing a 5′ element of interest.

FIG. 47B depicts the generation of an attR2 and attL3-flanked entryclone containing a 3′ element of interest.

FIG. 48 depicts various methods of generating an entry clone.

FIG. 49 depicts attB forward primers (SEQ ID NO:173, SEQ ID NO:174, SEQID NO:175).

FIG. 50 depicts attB reverse primers (SEQ ID NO:176, SEQ ID NO:177, SEQID NO:178).

FIG. 51A depicts the recombination region of the entry clone resultingfrom pDONR™P4-P1R×attB4-5′ element-attB1(SEQ ID NO:179).

FIG. 51B depicts the recombination region of the entry clone resultingfrom pDONR™221×attB1-genen of interest-attB2 (SEQ ID NO:180).

FIG. 51C depicts the recombination region of the entry clone resultingfrom pDONR™P2R-P3×attB2-3′ element-attB3 (SEQ ID NO:181).

FIG. 52 depicts the recombination region of the expression cloneresulting from pDEST™R4-R3×attL4-5′ entry clone-attR1×attL1-entryclone-attR2-3′ entry clone-attL3 (SEQ ID NO:182).

FIG. 53 is a vector map of pDONR™P4-P1R.

FIG. 54 is a vector map of pDONR™22 1.

FIG. 55 is a vector map of pDONR™P2R-P3

FIG. 56 is a vector map of pDEST™R4-R3.

FIG. 57 is a vector map of pMS/GW.

DETAILED DESCRIPTION OF THE INVENTION Definitions

In the description that follows, a number of terms used in recombinantnucleic acid technology are utilized extensively. In order to provide aclear and more consistent understanding of the specification and claims,including the scope to be given such terms, the following definitionsare provided.

Gene: As used herein, the term “gene” refers to a nucleic acid whichcontains information necessary for expression of a polypeptide, protein,or untranslated RNA (e.g., rRNA, tRNA, anti-sense RNA). When the geneencodes a protein, it includes the promoter and the structural gene openreading frame sequence (ORF), as well as other sequences involved inexpression of the protein. Of course, as would be clearly apparent toone skilled in the art, the transcriptional and translational machineryrequired for production of the gene product is not included within thedefinition of a gene. When the gene encodes an untranslated RNA, itincludes the promoter and the nucleic acid which encodes theuntranslated RNA.

Structural Gene: As used herein, the phrase “structural gene” refers torefers to a nucleic acid which is transcribed into messenger RNA that isthen translated into a sequence of amino acids characteristic of aspecific polypeptide.

Host: As used herein, the term “host” refers to any prokaryotic oreukaryotic organism that is a recipient of a replicable expressionvector, cloning vector or any nucleic acid molecule. The nucleic acidmolecule may contain, but is not limited to, a structural gene, atranscriptional regulatory sequence (such as a promoter, enhancer,repressor, and the like) and/or an origin of replication. As usedherein, the terms “host,” “host cell,” “recombinant host” and“recombinant host cell” may be used interchangeably. For examples ofsuch hosts, see Maniatis et al., Molecular Cloning: A Laboratory Manual,Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1982).

Transcriptional Regulatory Sequence: As used herein, the phrase“transcriptional regulatory sequence” refers to a functional stretch ofnucleotides contained on a nucleic acid molecule, in any configurationor geometry, that act to regulate the transcription of (1) one or morestructural genes (e.g., two, three, four, five, seven, ten, etc.) intomessenger RNA or (2) one or more genes into untranslated RNA. Examplesof transcriptional regulatory sequences include, but are not limited to,promoters, enhancers, repressors, and the like.

Promoter: As used herein, a promoter is an example of a transcriptionalregulatory sequence, and is specifically a nucleic acid generallydescribed as the 5′-region of a gene located proximal to the start codonor nucleic acid which encodes untranslated RNA. The transcription of anadjacent nucleic acid segment is initiated at the promoter region. Arepressible promoter's rate of transcription decreases in response to arepressing agent. An inducible promoter's rate of transcriptionincreases in response to an inducing agent. A constitutive promoter'srate of transcription is not specifically regulated, though it can varyunder the influence of general metabolic conditions.

Insert: As used herein, the term “insert” refers to a desired nucleicacid segment that is a part of a larger nucleic acid molecule. In manyinstances, the insert will be introduced into the larger nucleic acidmolecule. For example, the nucleic acid segments labeled ccdB and DNA-Ain FIG. 2, are nucleic acid inserts with respect to the larger nucleicacid molecule shown therein. In most instances, the insert will beflanked by recombination sites (e.g., at least one recombination site ateach end). In certain embodiments, however, the insert will only containa recombination site on one end.

Target Nucleic Acid Molecule: As used herein, the phrase “target nucleicacid molecule” refers to a nucleic acid segment of interest, preferablynucleic acid which is to be acted upon using the compounds and methodsof the present invention. Such target nucleic acid molecules preferablycontain one or more genes (e.g., two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.) or portions of genes.

Insert Donor: As used herein, the phrase “Insert Donor” refers to one ofthe two parental nucleic acid molecules (e.g., RNA or DNA) of thepresent invention which carries the Insert (see FIG. 1). The InsertDonor molecule comprises the Insert flanked on both sides withrecombination sites. The Insert Donor can be linear or circular. In oneembodiment of the invention, the Insert Donor is a circular nucleic acidmolecule, optionally supercoiled, and further comprises a cloning vectorsequence outside of the recombination signals. When a population ofInserts or population of nucleic acid segments are used to make theInsert Donor, a population of Insert Donors result and may be used inaccordance with the invention.

Product: As used herein, the term “Product” refers to one the desireddaughter molecules comprising the A and D sequences which is producedafter the second recombination event during the recombinational cloningprocess (see FIG. 1). The Product contains the nucleic acid which was tobe cloned or subcloned. In accordance with the invention, when apopulation of Insert Donors are used, the resulting population ofProduct molecules will contain all or a portion of the population ofInserts of the Insert Donors and preferably will contain arepresentative population of the original molecules of the InsertDonors.

Byproduct: As used herein, the term “Byproduct” refers to a daughtermolecule (a new clone produced after the second recombination eventduring the recombinational cloning process) lacking the segment which isdesired to be cloned or subcloned.

Cointegrate: As used herein, the term “Cointegrate” refers to at leastone recombination intermediate nucleic acid molecule of the presentinvention that contains both parental (starting) molecules. Cointegratesmay be linear or circular. RNA and polypeptides may be expressed fromcointegrates using an appropriate host cell strain, for example E. coliDB3.1 (particularly E. coli LIBRARY EFFICIENCY® DB3.1™ Competent Cells),and selecting for both selection markers found on the cointegratemolecule.

Recognition Sequence: As used herein, the phrase “recognition sequence”refers to a particular sequence to which a protein, chemical compound,DNA, or RNA molecule (e.g., restriction endonuclease, a modificationmethylase, or a recombinase) recognizes and binds. In the presentinvention, a recognition sequence will usually refer to a recombinationsite. For example, the recognition sequence for Cre recombinase is loxPwhich is a 34 base pair sequence comprising two 13 base pair invertedrepeats (serving as the recombinase binding sites) flanking an 8 basepair core sequence. (See FIG. 1 of Sauer, B., Current Opinion inBiotechnology 5:521-527 (1994).) Other examples of recognition sequencesare the attB, attP, attL, and attR sequences which are recognized by therecombinase enzyme λ Integrase. attB is an approximately 25 base pairsequence containing two 9 base pair core-type Int binding sites and a 7base pair overlap region. attP is an approximately 240 base pairsequence containing core-type Int binding sites and arm-type Int bindingsites as well as sites for auxiliary proteins integration host factor(IHF), FIS and excisionase (X is). (See Landy, Current Opinion inBiotechnology 3:699-707 (1993).) Such sites may also be engineeredaccording to the present invention to enhance production of products inthe methods of the invention. For example, when such engineered siteslack the PI or Hi domains to make the recombination reactionsirreversible (e.g., attR or attP), such sites may be designated attR′ orattP′ to show that the domains of these sites have been modified in someway.

Recombination Proteins: As used herein, the phrase “recombinationproteins” includes excisive or integrative proteins, enzymes, co-factorsor associated proteins that are involved in recombination reactionsinvolving one or more recombination sites (e.g., two, three, four, five,seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.), which may bewild-type proteins (see Landy, Current Opinion in Biotechnology3:699-707 (1993)), or mutants, derivatives (e.g., fusion proteinscontaining the recombination protein sequences or fragments thereof),fragments, and variants thereof. Examples of recombination proteinsinclude Cre, Int, IHF, X is, Flp, F is, Hin, Gin, ΦC31, Cin, Tn3resolvase, TndX, XerC, XerD, TnpX, Hjc, Gin, SpCCE1, and ParA.

Recombination Site: A used herein, the phrase “recombination site”refers to a recognition sequence on a nucleic acid molecule whichparticipates in an integration/recombination reaction by recombinationproteins. Recombination sites are discrete sections or segments ofnucleic acid on the participating nucleic acid molecules that arerecognized and bound by a site-specific recombination protein during theinitial stages of integration or recombination. For example, therecombination site for Cre recombinase is loxP which is a 34 base pairsequence comprised of two 13 base pair inverted repeats (serving as therecombinase binding sites) flanking an 8 base pair core sequence. (SeeFIG. 1 of Sauer, B., Curr. Opin. Biotech. 5:521-527 (1994).) Otherexamples of recognition sequences include the attB, attP, attL, and attRsequences described herein, and mutants, fragments, variants andderivatives thereof, which are recognized by the recombination protein λInt and by the auxiliary proteins integration host factor (1HF), FIS andexcisionase (λ is). (See Landy, Curr. Opin. Biotech. 3:699-707 (1993).)

Recombination sites may be added to molecules by any number of knownmethods. For example, recombination sites can be added to nucleic acidmolecules by blunt end ligation, PCR performed with fully or partiallyrandom primers, or inserting the nucleic acid molecules into an vectorusing a restriction site which flanked by recombination sites.

Recombinational Cloning: As used herein, the phrase “recombinationalcloning” refers to a method, such as that described in U.S. Pat. Nos.5,888,732 and 6,143,557 (the contents of which are fully incorporatedherein by reference), whereby segments of nucleic acid molecules orpopulations of such molecules are exchanged, inserted, replaced,substituted or modified, in vitro or in vivo. Preferably, such cloningmethod is an in vitro method.

Repression Cassette: As used herein, the phrase “repression cassette”refers to a nucleic acid segment that contains a repressor or aselectable marker present in the subcloning vector.

Selectable Marker: As used herein, the phrase “selectable marker” refersto a nucleic acid segment that allows one to select for or against amolecule (e.g., a replicon) or a cell that contains it, often underparticular conditions. These markers can encode an activity, such as,but not limited to, production of RNA, peptide, or protein, or canprovide a binding site for RNA, peptides, proteins, inorganic andorganic compounds or compositions and the like. Examples of selectablemarkers include but are not limited to: (1) nucleic acid segments thatencode products which provide resistance against otherwise toxiccompounds (e.g., antibiotics); (2) nucleic acid segments that encodeproducts which are otherwise lacking in the recipient cell e.g., tRNAgenes, auxotrophic markers); (3) nucleic acid segments that encodeproducts which suppress the activity of a gene product; (4) nucleic acidsegments that encode products which can be readily identified (e.g.,phenotypic markers such as (β-galactosidase, green fluorescent protein(GFP), yellow flourescent protein (YFP), red fluorescent protein (RFP),cyan fluorescent protein (CFP), and cell surface proteins); (5) nucleicacid segments that bind products which are otherwise detrimental to cellsurvival and/or function; (6) nucleic acid segments that otherwiseinhibit the activity of any of the nucleic acid segments described inNos. 1-5 above (e.g., antisense oligonucleotides); (7) nucleic acidsegments that bind products that modify a substrate (e.g., restrictionendonucleases); (8) nucleic acid segments that can be used to isolate oridentify a desired molecule (e.g., specific protein binding sites); (9)nucleic acid segments that encode a specific nucleotide sequence whichcan be otherwise non-functional (e.g., for PCR amplification ofsubpopulations of molecules); (10) nucleic acid segments, which whenabsent, directly or indirectly confer resistance or sensitivity toparticular compounds; and/or (11) nucleic acid segments that encodeproducts which either are toxic (e.g., Diphtheria toxin) or convert arelatively non-toxic compound to a toxic compound (e.g., Herpes simplexthymidine kinase, cytosine deaminase) in recipient cells; (12) nucleicacid segments that inhibit replication, partition or heritability ofnucleic acid molecules that contain them; and/or (13) nucleic acidsegments that encode conditional replication functions, e.g.,replication in certain hosts or host cell strains or under certainenvironmental conditions (e.g., temperature, nutritional conditions,etc.).

Selection Scheme: As used herein, the phrase “selection scheme” refersto any method which allows selection, enrichment, or identification of adesired nucleic acid molecules or host cells contacting them (inparticular Product or Product(s) from a mixture containing an EntryClone or Vector, a Destination Vector, a Donor Vector, an ExpressionClone or Vector, any intermediates (e.g., a Cointegrate or a replicon),and/or Byproducts). In one aspect, selection schemes of the inventionrely on one or more selectable markers. The selection schemes of oneembodiment have at least two components that are either linked orunlinked during recombinational cloning. One component is a selectablemarker. The other component controls the expression in vitro or in vivoof the selectable marker, or survival of the cell (or the nucleic acidmolecule, e.g., a replicon) harboring the plasmid carrying theselectable marker. Generally, this controlling element will be arepressor or inducer of the selectable marker, but other means forcontrolling expression or activity of the selectable marker can be used.Whether a repressor or activator is used will depend on whether themarker is for a positive or negative selection, and the exactarrangement of the various nucleic acid segments, as will be readilyapparent to those skilled in the art. In some preferred embodiments, theselection scheme results in selection of or enrichment for only one ormore desired nucleic acid molecules (such as Products). As definedherein, selecting for a nucleic acid molecule includes (a) selecting orenriching for the presence of the desired nucleic acid molecule(referred to as a “positive selection scheme”), and (b) selecting orenriching against the presence of nucleic acid molecules that are notthe desired nucleic acid molecule (referred to as a “negative selectionscheme”).

In one embodiment, the selection schemes (which can be carried out inreverse) will take one of three forms, which will be discussed in termsof FIG. 1. The first, exemplified herein with a selectable marker and arepressor therefore, selects for molecules having segment D and lackingsegment C. The second selects against molecules having segment C and formolecules having segment D. Possible embodiments of the second formwould have a nucleic acid segment carrying a gene toxic to cells intowhich the in vitro reaction products are to be introduced. A toxic genecan be a nucleic acid that is expressed as a toxic gene product (a toxicprotein or RNA), or can be toxic in and of itself. (In the latter case,the toxic gene is understood to carry its classical definition of“heritable trait”.)

Examples of such toxic gene products are well known in the art, andinclude, but are not limited to, restriction endonucleases (e.g., DpnI,Nla3, etc.); apoptosis-related genes (e.g., ASK1 or members of thebcl-21ced-9 family); retroviral genes; including those of the humanimmunodeficiency virus (HIV); defensins such as NP-1; inverted repeatsor paired palindromic nucleic acid sequences; bacteriophage lytic genessuch as those from ΦX174 or bacteriophage T4; antibiotic sensitivitygenes such as rpsL; antimicrobial sensitivity genes such as pheS;plasmid killer genes' eukaryotic transcriptional vector genes thatproduce a gene product toxic to bacteria, such as GATA-1; genes thatkill hosts in the absence of a suppressing function, e.g., kicB, ccdB,ΦX174 E (Liu, Q. et al., Curr. Biol. 8:1300-1309 (1998)); and othergenes that negatively affect replicon stability and/or replication. Atoxic gene can alternatively be selectable in vitro, e.g., a restrictionsite.

Many genes coding for restriction endonucleases operably linked toinducible promoters are known, and may be used in the present invention.(See, e.g., U.S. Pat. No. 4,960,707 (DpnI and Dpnll); U.S. Pat. Nos.5,000,333, 5,082,784 and 5,192,675 (KpnI); U.S. Pat. No. 5,147,800(NgoAIII and NgoAI); U.S. Pat. No. 5,179,015 (FspI and HaeIII): U.S.Pat. No. 5,200,333 (HaeII and TaqI); U.S. Pat. No. 5,248,605 (HpaII);U.S. Pat. No. 5,312,746 (ClaI); U.S. Pat. Nos. 5,231,021 and 5,304,480(XhoI and XhoII); U.S. Pat. No. 5,334,526 (AluI); U.S. Pat. No.5,470,740 (NsiI); U.S. Pat. No. 5,534,428 (SstI/SacI); U.S. Pat. No.5,202,248 (NcoI); U.S. Pat. No. 5,139,942 (NdeI); and U.S. Pat. No.5,098,839 (PacI). (See also Wilson, G. G., Nucl. Acids Res. 19:2539-2566(1991); and Lunnen, K. D., et al., Gene 74:25-32 (1988).)

In the second form, segment D carries a selectable marker. The toxicgene would eliminate transformants harboring the Vector Donor,Cointegrate, and Byproduct molecules, while the selectable marker can beused to select for cells containing the Product and against cellsharboring only the Insert Donor.

The third form selects for cells that have both segments A and D in cison the same molecule, but not for cells that have both segments in transon different molecules. This could be embodied by a selectable markerthat is split into two inactive fragments, one each on segments A and D.

The fragments are so arranged relative to the recombination sites thatwhen the segments are brought together by the recombination event, theyreconstitute a functional selectable marker. For example, therecombinational event can link a promoter with a structural nucleic acidmolecule (e.g., a gene), can link two fragments of a structural nucleicacid molecule, or can link nucleic acid molecules that encode aheterodimeric gene product needed for survival, or can link portions ofa replicon.

Site-Specific Recombinase: As used herein, the phrase “site-specificrecombinase” refers to a type of recombinase which typically has atleast the following four activities (or combinations thereof): (1)recognition of specific nucleic acid sequences; (2) cleavage of saidsequence or sequences; (3) topoisomerase activity involved in strandexchange; and (4) ligase activity to reseal the cleaved strands ofnucleic acid. (See Sauer, B., Current Opinions in Biotechnology5:521-527 (1994).) Conservative site-specific recombination isdistinguished from homologous recombination and transposition by a highdegree of sequence specificity for both partners. The strand exchangemechanism involves the cleavage and rejoining of specific nucleic acidsequences in the absence of DNA synthesis (Landy, A. (1989) Ann. Rev.Biochem. 58:913-949).

Homologous Recombination: As used herein, the phrase “homologousrecombination” refers to the process in which nucleic acid moleculeswith similar nucleotide sequences associate and exchange nucleotidestrands. A nucleotide sequence of a first nucleic acid molecule which iseffective for engaging in homologous recombination at a predefinedposition of a second nucleic acid molecule will therefore have anucleotide sequence which facilitates the exchange of nucleotide strandsbetween the first nucleic acid molecule and a defined position of thesecond nucleic acid molecule. Thus, the first nucleic acid willgenerally have a nucleotide sequence which is sufficiently complementaryto a portion of the second nucleic acid molecule to promote nucleotidebase pairing.

Homologous recombination requires homologous sequences in the tworecombining partner nucleic acids but does not require any specificsequences. As indicated above, site-specific recombination which occurs,for example, at recombination sites such as att sites, is not consideredto be “homologous recombination,” as the phrase is used herein.

Vector: As used herein, the terms “vector” refers to a nucleic acidmolecule (preferably DNA) that provides a useful biological orbiochemical property to an insert. Examples include plasmids, phages,autonomously replicating sequences (ARS), centromeres, and othersequences which are able to replicate or be replicated in vitro or in ahost cell, or to convey a desired nucleic acid segment to a desiredlocation within a host cell. A vector can have one or more restrictionendonuclease recognition sites (e.g., two, three, four, five, seven,ten, etc.) at which the sequences can be cut in a determinable fashionwithout loss of an essential biological function of the vector, and intowhich a nucleic acid fragment can be spliced in order to bring about itsreplication and cloning. Vectors can further provide primer sites (e.g.,for PCR), transcriptional and/or translational initiation and/orregulation sites, recombinational signals, replicons, selectablemarkers, etc. Clearly, methods of inserting a desired nucleic acidfragment which do not require the use of recombination, transpositionsor restriction enzymes (such as, but not limited to, uracilN-glycosylase (UDG) cloning of PCR fragments (U.S. Pat. Nos. 5,334,575and 5,888,795, both of which are entirely incorporated herein byreference), T:A cloning, and the like) can also be applied to clone afragment into a cloning vector to be used according to the presentinvention. The cloning vector can further contain one or more selectablemarkers (e.g., two, three, four, five, seven, ten, etc.) suitable foruse in the identification of cells transformed with the cloning vector.

Subeloning Vector: As used herein, the phrase “subcloning vector” refersto a cloning vector comprising a circular or linear nucleic acidmolecule which includes, preferably, an appropriate replicon. In thepresent invention, the subcloning vector (segment D in FIG. 1) can alsocontain functional and/or regulatory elements that are desired to beincorporated into the final product to act upon or with the clonednucleic acid insert (segment A in FIG. 1). The subcloning vector canalso contain a selectable marker (preferably DNA).

Vector Donor: As used herein, the phrase “Vector Donor” refers to one ofthe two parental nucleic acid molecules (e.g., RNA or DNA) of thepresent invention which carries the nucleic acid segments comprising thenucleic acid vector which is to become part of the desired Product. TheVector Donor comprises a subcloning vector D (or it can be called thecloning vector if the Insert Donor does not already contain a cloningvector) and a segment C flanked by recombination sites (see FIG. 1).Segments C and/or D can contain elements that contribute to selectionfor the desired Product daughter molecule, as described above forselection schemes. The recombination signals can be the same ordifferent, and can be acted upon by the same or different recombinases.In addition, the Vector Donor can be linear or circular.

Primer: As used herein, the term “primer” refers to a single stranded ordouble stranded oligonucleotide that is extended by covalent bonding ofnucleotide monomers during amplification or polymerization of a nucleicacid molecule (e.g., a DNA molecule). In one aspect, the primer may be asequencing primer (for example, a universal sequencing primer). Inanother aspect, the primer may comprise a recombination site or portionthereof.

Adapter: As used herein, the term “adapter” refers to an oligonucleotideor nucleic acid fragment or segment (preferably DNA) which comprises oneor more recombination sites (or portions of such recombination sites)which in accordance with the invention can be added to a circular orlinear Insert Donor molecule as well as other nucleic acid moleculesdescribed herein. When using portions of recombination sites, themissing portion may be provided by the Insert Donor molecule. Suchadapters may be added at any location within a circular or linearmolecule, although the adapters are preferably added at or near one orboth termini of a linear molecule. Preferably, adapters are positionedto be located on both sides (flanking) a particular nucleic acidmolecule of interest. In accordance with the invention, adapters may beadded to nucleic acid molecules of interest by standard recombinanttechniques (e.g., restriction digest and ligation). For example,adapters may be added to a circular molecule by first digesting themolecule with an appropriate restriction enzyme, adding the adapter atthe cleavage site and reforming the circular molecule which contains theadapter(s) at the site of cleavage. In other aspects, adapters may beadded by homologous recombination, by integration of RNA molecules, andthe like. Alternatively, adapters may be ligated directly to one or moreand preferably both termini of a linear molecule thereby resulting inlinear molecule(s) having adapters at one or both termini. In one aspectof the invention, adapters may be added to a population of linearmolecules, (e.g., a cDNA library or genomic DNA which has been cleavedor digested) to form a population of linear molecules containingadapters at one and preferably both termini of all or substantialportion of said population.

Adapter-Primer: As used herein, the phrase “adapter-primer” refers to aprimer molecule which comprises one or more recombination sites (orportions of such recombination sites) which in accordance with theinvention can be added to a circular or linear nucleic acid moleculedescribed herein. When using portions of recombination sites, themissing portion may be provided by a nucleic acid molecule (e.g., anadapter) of the invention. Such adapter-primers may be added at anylocation within a circular or linear molecule, although theadapter-primers are preferably added at or near one or both termini of alinear molecule. Examples of such adapter-primers and the use thereof inaccordance with the methods of the invention are shown in Example 8herein. Such adapter-primers may be used to add one or morerecombination sites or portions thereof to circular or linear nucleicacid molecules in a variety of contexts and by a variety of techniques,including but not limited to amplification (e.g., PCR), ligation (e.g.,enzymatic or chemical/synthetic ligation), recombination (e.g.,homologous or non-homologous (illegitimate) recombination) and the like.

Template: As used herein, the term “template” refers to a doublestranded or single stranded nucleic acid molecule which is to beamplified, synthesized or sequenced. In the case of a double-strandedDNA molecule, denaturation of its strands to form a first and a secondstrand is preferably performed before these molecules may be amplified,synthesized or sequenced, or the double stranded molecule may be useddirectly as a template. For single stranded templates, a primercomplementary to at least a portion of the template hybridizes underappropriate conditions and one or more polypeptides having polymeraseactivity (e.g., two, three, four, five, or seven DNA polymerases and/orreverse transcriptases) may then synthesize a molecule complementary toall or a portion of the template. Alternatively, for double strandedtemplates, one or more transcriptional regulatory sequences (e.g., two,three, four, five, seven or more promoters) may be used in combinationwith one or more polymerases to make nucleic acid moleculescomplementary to all or a portion of the template. The newly synthesizedmolecule, according to the invention, may be of equal or shorter lengthcompared to the original template. Mismatch incorporation or strandslippage during the synthesis or extension of the newly synthesizedmolecule may result in one or a number of mismatched base pairs. Thus,the synthesized molecule need not be exactly complementary to thetemplate. Additionally, a population of nucleic acid templates may beused during synthesis or amplification to produce a population ofnucleic acid molecules typically representative of the original templatepopulation.

Incorporating: As used herein, the term “incorporating” means becoming apart of a nucleic acid (e.g., DNA) molecule or primer.

Library: As used herein, the term “library” refers to a collection ofnucleic acid molecules (circular or linear). In one embodiment, alibrary may comprise a plurality of nucleic acid molecules (e.g., two,three, four, five, seven, ten, twelve, fifteen, twenty, thirty, fifty,one hundred, two hundred, five hundred one thousand, five thousand, ormore), which may or may not be from a common source organism, organ,tissue, or cell. In another embodiment, a library is representative ofall or a portion or a significant portion of the nucleic acid content ofan organism (a “genomic” library), or a set of nucleic acid moleculesrepresentative of all or a portion or a significant portion of theexpressed nucleic acid molecules (a cDNA library or segments derivedtherefrom) in a cell, tissue, organ or organism. A library may alsocomprise nucleic acid molecules having random sequences made by de novosynthesis, mutagenesis of one or more nucleic acid molecules, and thelike. Such libraries may or may not be contained in one or more vectors(e.g., two, three, four, five, seven, ten, twelve, fifteen, twenty,thirty, fifty, etc.).

Amplification: As used herein, the term “amplification” refers to any invitro method for increasing the number of copies of a nucleic acidmolecule with the use of one or more polypeptides having polymeraseactivity (e.g., one, two, three, four or more nucleic acid polymerasesor reverse transcriptases). Nucleic acid amplification results in theincorporation of nucleotides into a DNA and/or RNA molecule or primerthereby forming a new nucleic acid molecule complementary to a template.The formed nucleic acid molecule and its template can be used astemplates to synthesize additional nucleic acid molecules. As usedherein, one amplification reaction may consist of many rounds of nucleicacid replication. DNA amplification reactions include, for example,polymerase chain reaction (PCR). One PCR reaction may consist of 5 to100 cycles of denaturation and synthesis of a DNA molecule.

Nucleotide: As used herein, the term “nucleotide” refers to abase-sugar-phosphate combination. Nucleotides are monomeric units of anucleic acid molecule (DNA and RNA). The term nucleotide includesribonucleoside triphosphates ATP, UTP, CTG, GTP and deoxyribonucleosidetriphosphates such as DATP, dCTP, dITP, dUTP, dGTP, dTTP, or derivativesthereof. Such derivatives include, for example, [aS]dATP, 7-deaza-dGTPand 7-deaza-dATP. The term nucleotide as used herein also refers todideoxyribonucleoside triphosphates (ddNTPs) and their derivatives.Illustrated examples of dideoxyribonucleoside triphosphates include, butare not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP. According tothe present invention, a “nucleotide” may be unlabeled or detectablylabeled by well known techniques. Detectable labels include, forexample, radioactive isotopes, fluorescent labels, chemiluminescentlabels, bioluminescent labels and enzyme labels.

Nucleic Acid Molecule: As used herein, the phrase “nucleic acidmolecule” refers to a sequence of contiguous nucleotides (riboNTPs,dNTPs or ddNTPs, or combinations thereof) of any length which may encodea full-length polypeptide or a fragment of any length thereof, or whichmay be non-coding. As used herein, the terms “nucleic acid molecule” and“polynucleotide” may be used interchangeably and include both RNA andDNA.

Oligonucleotide: As used herein, the term “oligonucleotide” refers to asynthetic or natural molecule comprising a covalently linked sequence ofnucleotides which are joined by a phosphodiester bond between the 3′position of the pentose of one nucleotide and the 5′ position of thepentose of the adjacent nucleotide.

Polypeptide: As used herein, the term “polypeptide” refers to a sequenceof contiguous amino acids, of any length. The terms “peptide,”“oligopeptide,” or “protein” may be used interchangeably herein with theterm “polypeptide.”

Hybridization: As used herein, the terms “hybridization” and“hybridizing” refer to base pairing of two complementary single-strandednucleic acid molecules (RNA and/or DNA) to give a double strandedmolecule. As used herein, two nucleic acid molecules may hybridize,although the base pairing is not completely complementary. Accordingly,mismatched bases do not prevent hybridization of two nucleic acidmolecules provided that appropriate conditions, well known in the art,are used. In some aspects, hybridization is said to be under “stringentconditions.” By “stringent conditions,” as the phrase is used herein, ismeant overnight incubation at 42° C. in a solution comprising: 50%formamide, 5×SSC (750 mM NaCl, 75 mM trisodium citrate), 50 mM sodiumphosphate (pH 7.6), 5×Denhardt's solution, 10% dextran sulfate, and 20μg/ml denatured, sheared salmon sperm DNA, followed by washing thefilters in 0.1×SSC at about 65° C.

Reaction Buffers: The invention further includes reaction buffers forperforming recombination reactions (e.g., L×R reaction, B×P reactions,etc.) and reaction mixtures which comprise such reaction buffer, as wellas methods employing reaction buffers of the invention for performingrecombination reactions and products of recombination reactions producedusing such reaction buffers. Typically, reaction buffers of theinvention will contain one or more of the following components: (1) oneor more buffering agent (e.g., sodium phosphate, sodium acetate,2-(N-moropholino)-ethanesulfonic acid (MES),tris-(hydroxymethyl)aminomethane (Tris),3-(cyclohexylamino)-2-hydroxy-1-propanesulfonic acid (CAPS), citrate,N-2-hydroxyethylpiperazine-N′-2-etha-nesulfonic acid (HEPES), acetate,3-(N-morpholino)prpoanesulfonic acid (MOPS), N-tris(hydroxymethyl)methyl-3-aminopropanesulfonio acid (TAPS), etc.), (2) oneor more salt (e.g., NaCl, KCl, etc.), (3) one or more chelating agent(e.g., one of more chelating agent which predominantly chelate divalentmetal ions such as EDTA or EGTA), (4) one or more polyamine (e.g.,spermidine, spermine, etc.), (5) one or more protein which is nottypically directly involved in recombination reactions (e.g., BSA,ovalbumin, etc.), or (6) one or more diluent (e.g., water).

The concentration of the buffering agent in the reaction buffer of theinvention will vary with the particular buffering agent used. Typically,the working concentration (i.e., the concentration in the reactionmixture) of the buffering agent will be from about 5 mM to about 500 mM(e.g., about 10 mM, about 15 mM, about 20 mM, about 25 mM, about 30 mM,about 35 mM, about 40 mM, about 45 mM, about 50 mM, about 55 mM, about60 mM, about 65 mM, about 70 mM, about 75 mM, about 80 mM, about 85 mM,about 90 mM, about 95 mM, about 100 mM, from about 5 mM to about 500 mM,from about 10 mM to about 500 mM, from about 20 mM to about 500 mM, fromabout 25 mM to about 500 mM, from about 30 mM to about 500 mM, fromabout 40 mM to about 500 mM, from about 50 mM to about 500 mM, fromabout 75 mM to about 500 mM, from about 100 mM to about 500 mM, fromabout 25 mM to about 50 mM, from about 25 mM to about 75 mM, from about25 mM to about 100 mM, from about 25 mM to about 200 mM, from about 25mM to about 300 mM, etc.). When Tris (e.g., Tris-HCl) is used, the Trisworking concentration will typically be from about 5 mM to about 100 mM,from about 5 mM to about 75 mM, from about 10 mM to about 75 mM, fromabout 10 mM to about 60 mM, from about 10 mM to about 50 mM, from about25 mM to about 50 mM, etc.

The final pH of solutions of the invention will generally be set andmaintained by buffering agents present in reaction buffers of theinvention. The pH of reaction buffers of the invention, and hencereaction mixtures of the invention, will vary with the particular useand the buffering agent present but will often be from about pH 5.5 toabout pH 9.0 (e.g., about pH 6.0, about pH 6.5, about pH 7.0, about pH7.1, about pH 7.2, about pH 7.3, about pH 7.4, about pH 7.5, about pH7.6, about pH 7.7, about pH 7.8, about pH 7.9, about pH 8.0, about pH8.1, about pH 8.5, about pH 9.0, from about pH 6.0 to about pH 8.5, fromabout pH 6.5 to about pH 8.5, from about pH 7.0 to about pH 8.5, fromabout pH 7.5 to about pH 8.5, from about pH 6.0 to about pH 8.0, fromabout pH 6.0 to about pH 7.7, from about pH 6.0 to about pH 7.5, fromabout pH 6.0 to about pH 7.0, from about pH 7.2 to about pH 7.7, fromabout pH 7.3 to about pH 7.7, from about pH 7.4 to about pH 7.6, fromabout pH 7.0 to about pH 7.4, from about pH 7.6 to about pH 8.0, fromabout pH 7.6 to about pH 8.5, etc.)

As indicated, one or more salts (e.g., NaCl, KCl, etc.) may be includedin reaction buffers of the invention. In many instances, salts used inreaction buffers of the invention will dissociate in solution togenerate at least one species which is monovalent (e.g., Na+, K+, etc.)When included in reaction buffers of the invention, salts will often bepresent either individually or in a combined concentration of from about0.5 mM to about 500 mM (e.g., about 1 mM, about 2 mM, about 3 mM, about5 mM, about 10 mM, about 12 mM, about 15 mM, about 17 mM, about 20 mM,about 22 mM, about 23 mM, about 24 mM, about 25 mM, about 27 mM, about30 mM, about 35 mM, about 40 mM, about 45 mM, about 50 mM, about 55 mM,about 60 mM, about 64 mM, about 65 mM, about 70 mM, about 75 mM, about80 mM, about 85 mM, about 90 mM, about 95 mM, about 100 mM, about 120mM, about 140 mM, about 150 mM, about 175 mM, about 200 mM, about 225mM, about 250 mM, about 275 mM, about 300 mM, about 325 mM, about 350mM, about 375 mM, about 400 mM, from about 1 mM to about 500 mM, fromabout 5 mM to about 500 mM, from about 10 mM to about 500 mM, from about20 mM to about 500 mM, from about 30 mM to about 500 mM, from about 40mM to about 500 mM, from about 50 mM to about 500 mM, from about 60 mMto about 500 mM, from about 65 mM to about 500 mM, from about 75 mM toabout 500 mM, from about 85 mM to about 500 mM, from about 90 mM toabout 500 mM, from about 100 mM to about 500 mM, from about 125 mM toabout 500 mM, from about 150 mM to about 500 mM, from about 200 mM toabout 500 mM, from about 10 mM to about 100 mM, from about 10 mM toabout 75 mM, from about 10 mM to about 50 mM, from about 20 mM to about200 mM, from about 20 mM to about 150 mM, from about 20 mM to about 125mM, from about 20 mM to about 100 mM, from about 20 mM to about 80 mM,from about 20 mM to about 75 mM, from about 20 mM to about 60 mM, fromabout 20 mM to about 50 mM, from about 30 mM to about 500 mM, from about30 mM to about 100 mM, from about 30 mM to about 70 mM, from about 30 mMto about 50 mM, etc.).

As also indicated above, one or more agents which chelate metal ions(e.g., monovalent or divalent metal ions) with relatively high affinitymay also be present in reaction buffers of the invention. Examples ofcompounds which chelate metal ions with relatively high affinity includeethylenediamine tetraacetic acid (EDTA), diethylenetriaminepentaaceticacid (DTPA), triethylenetetraamine hexaacetic acid (TTHA),ethylenebis(oxyethylenenitrilo)]tetraacetic acid (EGTA), andpropylenetriaminepentaacetic acid (PTPA). The free acid or salt ofchelating agents may be used to prepare reaction buffers of theinvention.

When included in reaction buffers of the invention, chelating agentswill often be present either individually or in a combined concentrationof from about 0.1 mM to about 50 mM (e.g., about 0.2 mM, about 0.3 mM,about 0.5 mM, about 0.7 mM, about 0.9 mM, about 1 mM, about 2 mM, about3 mM, about 4 mM, about 5 mM, about 6 mM, about 10 mM, about 12 mM,about 15 mM, about 17 mM, about 20 mM, about 22 mM, about 23 mM, about24 mM, about 25 mM, about 27 mM, about 30 mM, about 35 mM, about 40 mM,about 45 mM, about 50 mM, from about 0.1 mM to about 50 mM, from about0.5 mM to about 50 mM, from about 1 mM to about 50 mM, from about 2 mMto about 50 mM, from about 3 mM to about 50 mM, from about 0.5 mM toabout 20 mM, from about 0.5 mM to about 10 mM, from about 0.5 mM toabout 5 mM, from about 0.5 mM to about 2.5 mM, from about 1 mM to about20 mM, from about 1 mM to about 10 mM, from about 1 mM to about 5 mM,from about 1 mM to about 3.4 mM, from about 0.5 mM to about 3.0 mM, fromabout 1 mM to about 3.0 mM, from about 1.5 mM to about 3.0 mM, fromabout 2 mM to about 3.0 mM, from about 0.5 mM to about 2.5 mM, fromabout 1 mM to about 2.5 mM, from about 1.5 mM to about 2.5 mM, fromabout 2 mM to about 3.0 mM, from about 2.5 mM to about 3.0 mM, fromabout 0.5 mM to about 2 mM, from about 0.5 mM to about 1.5 mM, fromabout 0.5 mM to about 1.1 mM, etc.)

Reaction buffers of the invention may also contain one or more polyamine(e.g., spermine, spermidine, protamine, polylysine, andpolyethylenimine, etc.), which may be synthetic or naturally occurring.When included in reaction buffers of the invention, polyamines willoften be present either individually or in a combined concentration offrom about 0.1 mM to about 50 mM (e.g., about 0.2 mM, about 0.3 mM,about 0.5 mM, about 0.7 mM, about 0.9 mM, about 1 mM, about 2 mM, about3 mM, about 4 nm, about 5 mM, about 6 mM, about 6.5 mM, about 7 mM,about 7.5 mM, about 8 mM, about 8.5 mM, about 9 mM, about 9.5 mM, about10 mM, about 12 mM, about 15 mM, about 17 mM, about 20 mM, about 22 mM,about 23 mM, about 24 mM, about 25 mM, about 27 mM, about 30 mM, about35 mM, about 40 mM, about 45 mM, about 50 mM, from about 0.1 mM to about50 mM, from about 0.5 mM to about 50 mM, from about 1 mM to about 50 mM,from about 2 mM to about 50 mM, from about 3 mM to about 50 mM, fromabout 0.5 mM to about 20 mM, from about 0.5 mM to about 10 mM, fromabout 0.5 mM to about 5 mM, from about 0.5 mM to about 2.5 mM, fromabout 1 mM to about 20 mM, from about 1 mM to about 10 mM, from about 1mM to about 5 mM, from about 1 mM to about 3.4 mM, from about 0.5 mM toabout 3.0 mM, from about 1 mM to about 3.0 mM, from about 1.5 mM toabout 3.0 mM, from about 2 mM to about 3.0 mM, from about 0.5 mM toabout 2.5 mM, from about 1 mM to about 2.5 mM, from about 1.5 mM toabout 2.5 mM, from about 2 mM to about 3.0 mM, from about 2.5 mM toabout 3.0 mM, from about 0.5 mM to about 2 mM, from about 0.5 mM toabout 1.5 mM, from about 0.5 mM to about 1.1 mM, from about 7.6 mM toabout 20 mM, from about 7.7 mM to about 20 mM, from about 7.8 mM toabout 20 mM, from about 8.0 mM to about 20 mM, from about 8.1 mM toabout 20 mM, from about 8.2 mM to about 20 mM, from about 8.3 mM toabout 20 mM, from about 8.4 mM to about 20 mM, from about 8.5 mM toabout 20 mM, from about 9.0 mM to about 20 mM, from about 10.0 mM toabout 20 mM, from about 12.0 mM to about 20 mM, from about 7.6 mM toabout 50 mM, from about 8.0 mM to about 50 mM, etc.). For example,reaction buffers of the invention may contain spermidine at aconcentration of from about 7.6 mM to about 20 mM, from about 7.7 mM toabout 20 mM, from about 7.8 mM to about 20 mM, from about 8.0 mM toabout 20 mM, from about 8.1 mM to about 20 mM, from about 8.2 mM toabout 20 mM, from about 8.3 mM to about 20 mM, from about 8.4 mM toabout 20 mM, from about 8.5 mM to about 20 mM, from about 9.0 mM toabout 20 mM, from about 10.0 mM to about 20 mM, from about 12.0 mM toabout 20 mM, from about 7.6 mM to about 50 mM, from about 8.0 mM toabout 50 mM, etc.

Reaction buffers of the invention may also contain one or more proteinwhich is not typically directly involved in recombination reactions(e.g., bovine serum albumin (BSA); ovalbumin; immunoglobins, such asIgE, IgG, IgD; etc.). When included in reaction buffers of theinvention, such proteins will often be present either individually or ina combined concentration of from about 0.1 mg/ml to about 50 mg/ml(e.g., about 0.1 mg/ml, about 0.2 mg/ml, about 0.3 mg/ml, about 0.4mg/ml, about 0.5 mg/ml, about 0.6 mg/ml, about 0.7 mg/ml, about 0.8mg/ml, about 0.9 mg/ml, about 1.0 mg/ml, about 1.1 mg/ml, about 1.3mg/ml, about 1.5 mg/ml, about 1.7 mg/ml, about 2.0 mg/ml, about 2.5mg/ml, about 3.5 mg/ml, about 5.0 mg/ml, about 7.5 mg/ml, about 10mg/ml, about 15 mg/ml, about 20 mg/ml, about 25 mg/ml, about 30 mg/ml,about 35 mg/ml, about 40 mg/ml, from about 0.5 mg/ml to about 30 mg/ml,from about 0.75 mg/ml to about 30 mg/ml, from about 1.0 mg/ml to about30 mg/ml, from about 2.0 mg/ml to about 30 mg/ml, from about 3.0 mg/mlto about 30 mg/ml, from about 4.0 mg/ml to about 30 mg/ml, from about5.0 mg/ml to about 30 mg/ml, from about 7.5 mg/ml to about 30 mg/ml,from about 10 mg/ml to about 30 mg/ml, from about 15 mg/ml to about 30mg/ml, from about 0.5 mg/ml to about 20 mg/ml, from about 0.5 mg/ml toabout 10 mg/ml, from about 0.5 mg/ml to about 5 mg/ml, from about 0.5mg/ml to about 2 mg/ml, from about 0.5 mg/ml to about 1 mg/ml, fromabout 1 mg/ml to about 10 mg/ml, from about 1 mg/ml to about 5 mg/ml,from about 1 mg/ml to about 2 mg/ml, etc.).

Examples of reaction buffers of the invention include the following:

(1) 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 1 mg/ml BSA, 64 mM NaCl, 8 mMspermidine; (2) 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 1 mg/ml BSA, 64 mMNaCl, 10 mM spermidine; (3) 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 1 mg/mlBSA, 64 mM NaCl, 12 mM spermidine; (4) 50 mM Tris-HCl (pH 7.5), 1 mMEDTA, 1 mg/ml BSA, 75 mM NaCl, 8 mM spermidine; (5) 50 mM Tris-HCl (pH7.5), 1 mM EDTA, 1 mg/ml BSA, 64 mM NaCl, 15 mM spermidine; (6) 25 mMTris-HCl (pH 7.5), 1 mM EDTA, 1 mg/ml BSA, 64 mM NaCl, 8 mM spermidine;(7) 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 2 mg/ml BSA, 64 mM NaCl, 8 mMspermidine; (8) 25 mM Tris-HCl (pH 7.5), 5 mM EDTA, 1 mg/ml BSA, 64 mMNaCl, 8 mM spermidine; (9) 25 mM Tris-HCl (pH 7.5), 1 mM EDTA, 2 mg/mlBSA, 64 mM NaCl, 8 mM spermidine; (10) 100 mM Tris-HCl (pH 7.5), 1 mMEDTA, 1 mg/ml BSA, 64 mM NaCl, 10 mM spermidine; (11) 75 mM Tris-HCl (pH7.5), 1 mM EDTA, 1 mg/ml BSA, 65 mM NaCl, 8 mM spermidine; (12) 50 mMTris-HCl (pH 7.5), 1 mM EDTA, 1 mg/ml BSA, 64 mM NaCl, 8 mM spermine;(13) 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 1 mg/ml BSA, 65 mM NaCl, 8 mMspermidine; (14) 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 1 mg/ml BSA, 64 mMKCl, 8 mM spermidine; and (15) 75 mM Tris-HCl (pH 7.5), 1 mM EDTA, 1mg/ml BSA, 64 mM KCl, 8 mM spermidine.

Reaction buffers of the invention may be prepared as concentratedsolutions which are diluted to a working concentration for final use.For example, a reaction buffer of the invention may be prepared as a 5×concentrate with the following working concentrations of componentsbeing 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 1 mg/ml BSA, 64 mM NaCl, 8 mMspermidine. Such a 5× solution would contain 200 mM Tris-HCl (pH 7.5), 5mM EDTA, 5 mg/ml BSA, 325 mM NaCl, and 40 mM spermidine. Thus, a 5:1dilution is required to bring such a 5× solution to a workingconcentration. Reaction buffers of the invention may be prepared, forexamples, as a 2×, a 3×, a 4×, a 5×, a 6×, a 7×, a 8×, a 9×, a 10×, etc.solutions. One major limitation on the fold concentration of suchsolutions is that, when compounds reach particular concentrations insolution, precipitation occurs. Thus, concentrated reaction buffers willgenerally be prepared such that the concentrations of the variouscomponents are low enough so that precipitation of buffer componentswill not occur. As one skilled in the art would recognize, the upperlimit of concentration which is feasible for each solution will varywith the particular solution and the components present.

In many instances, reaction buffers of the invention will be provided insterile form. Sterilization may be performed on the individualcomponents of reaction buffers prior to mixing or on reaction buffersafter they are prepared. Sterilization of such solutions may beperformed by any suitable means including autoclaving orultrafiltration.

Nucleic acid molecules used in methods of the invention, as well asthose prepared by methods of the invention, may be dissolved in anaqueous buffer and added to the reaction mixture. One suitable set ofconditions is 4 PI CLONASE™ enzyme mixture (e.g., InvitrogenCorporation, Cat. Nos. 11791-019 and 11789-013), 4 μl 5× reaction bufferand nucleic acid and water to a final volume of 20 μl. This willtypically result in the inclusion of about 200 ng of Int and about 80 ngof IHF in a 20 μl BP reaction and about 150 ng Int, about 25 ng IHF andabout 30 ng X is in a 20 μl LR reaction.

Additional suitable sets of conditions include the use of smallerreaction volumes, for example, 2 μl CLONASE™ enzyme mixture (e.g.,Invitrogen Corporation, Cat. Nos. 11791-019 and 11789-013), 2 μl 5×reaction buffer and nucleic acid and water to a final volume of 10 μl.In other embodiments, a suitable set of conditions includes 2 μlCLONASE™ enzyme mixture (e.g., Invitrogen Corporation, Cat. Nos.11791-019 and 11789-013), 1 μl 10× reaction buffer and nucleic acid andwater to a final volume of 10 μl.

Proteins for conducting an LR reaction may be stored in a suitablebuffer, for example, LR Storage Buffer, which may comprise about 50 mMTris at about pH 7.5, about 50 mM NaCl, about 0.25 mM EDTA, about 2.5 mMspermidine, and about 0.2 mg/ml BSA. When stored, proteins for an LRreaction may be stored at a concentration of about 37.5 ng/μl INT, 10ng/μl IHF and 15 ng/μl XIS. Proteins for conducting a BP reaction may bestored in a suitable buffer, for example, BP Storage Buffer, which maycomprise about 25 mM Tris at about pH 7.5, about 22 mM NaCl, about 5 mMEDTA, about 5 mM spermidine, about 1 mg/ml BSA, and about 0.0025% TritonX-100. When stored, proteins for an BP reaction may be stored at aconcentration of about 37.5 ng/μl NT and 20 ng/μl IHF. One skilled inthe art will recognize that enzymatic activity may vary in differentpreparations of enzymes. The amounts suggested above may be modified toadjust for the amount of activity in any specific preparation ofenzymes.

A suitable 5× reaction buffer for conducting recombination reactions maycomprise 100 mM Tris pH 7.5, 88 mM NaCl, 20 mM EDTA, 20 mM spermidine,and 4 mg/ml BSA. Thus, in a recombination reaction, the final bufferconcentrations may be 20 mM Tris pH 7.5, 17.6 mM NaCl, 4 mM EDTA, 4 mMspermidine, and 0.8 mg/ml BSA. Those skilled in the art will appreciatethat the final reaction mixture may incorporate additional componentsadded with the reagents used to prepare the mixture, for example, a BPreaction may include 0.005% Triton X-100 incorporated from the BPClonase™

In additional embodiments, a IOX reaction buffer for conductingrecombination reactions may be prepared and comprise 200 mM Tris pH 7.5,176 mM NaCl, 40 mM EDTA, 40 mM spermidine, and 8 mg/ml BSA. Thus, in arecombination reaction, the final buffer concentrations may be 20 mMTris pH 7.5, 17.6 mM NaCl, 4 mM EDTA, 4 mM spermidine, and 0.8 mg/mlBSA. Those skilled in the art will appreciate that the final reactionmixture may incorporate additional components added with the reagentsused to prepare the mixture, for example, a BP reaction may include0.01% Triton X-100 incorporated from the BP Clonase™

In particular embodiments, particularly those in which attL sites are tobe recombined with attR sites, the final reaction mixture may includeabout 50 mM Tris HCl, pH 7.5, about 1 mM EDTA, about 1 mg/ml BSA, about75 mM NaCl and about 7.5 mM spermidine in addition to recombinationenzymes and the nucleic acids to be combined. In other embodiments,particularly those in which an attB site is to be recombined with anattP site, the final reaction mixture may include about 25 mM Tris HCl,pH 7.5, about 5 mM EDTA, about 1 mg/ml bovine serum albumin (BSA), about22 mM NaCl, and about 5 mM spermidine.

In some embodiments, particularly those in which attL sites are to berecombined with attR sites, the final reaction mixture may include about40 mM Tris HCl, pH 7.5, about 1 mM EDTA, about 1 mg/ml BSA, about 64 mMNaCl and about 8 mM spermidine in addition to recombination enzymes andthe nucleic acids to be combined. One of skill in the art willappreciate that the reaction conditions may be varied somewhat withoutdeparting from the invention. For example, the pH of the reaction may bevaried from about 7.0 to about 8.0; the concentration of buffer may bevaried from about 25 mM to about 100 mM; the concentration of EDTA maybe varied from about 0.5 mM to about 2 mM; the concentration of NaCl maybe varied from about 25 mM to about 150 mM; and the concentration of BSAmay be varied from 0.5 mg/ml to about 5 mg/ml. In other embodiments,particularly those in which an attB site is to be recombined with anattP site, the final reaction mixture may include about 25 mM Tris HCl,pH 7.5, about 5 mM EDTA, about 1 mg/ml bovine serum albumin (BSA), about22 mM NaCl, about 5 mM spermidine and about 0.005% detergent (e.g.,Triton X-100).

In other embodiments, the recombination reactions may be prepared usinga buffer which performs the functions of both the storage and reactionbuffers in one. Suitably, in such embodiments, this buffer may comprisebetween about 100-200 mM Tris pH 7.5, between about 88-176 mM NaCl,between about 20-40 mM EDTA, between about 20-40 mM spermidine, andbetween about 4-8 mg/ml BSA. Those skilled in the art will appreciatethat the final reaction mixture may incorporate additional componentsadded with the reagents used to prepare the mixture, for example, a BPreaction may include between about 0.005-0.01% Triton X-100 incorporatedfrom the BP Clonase™. These combination buffers would also includeproteins for conducting an LR or a BP reaction. When stored, proteinsfor an LR reaction may be stored at a concentration of between about37.5-75 ng/μl INT, between about 10-20 ng/μl IHF and between about 15-30ng/μl XIS; proteins for an BP reaction may be stored at a concentrationof between about 37.5-75 ng/μl INT and between about 20-40 ng/μl IHF.

Derivative: As used herein the term “derivative”, when used in referenceto a vector, means that the derivative vector contains one or more(e.g., one, two, three, four five, etc.) nucleic acid segments whichshare sequence similar to at least one vector represented in one or moreof FIG. 1, 2, 3, 5, 6, 7, 8, 9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 25,26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 41, 42, 43, 47, 53, 54, 55,56, or 57. In particular embodiments, a derivative vector (1) may beobtained by alteration of a vector described herein (e.g., a vectorrepresented in FIG. 1, 2, 3, 5, 6, 7, 8, 9, 14, 15, 16, 17, 18, 19, 20,21, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 41, 42, 43, 47,53, 54, 55, 56, or 57), or (2) may contain one or more elements (e.g.,ampicillin resistance marker, attL1 recombination site, TOPO site, etc.)of a vector described herein. Further, as noted above, a derivativevector may contain one or more element which shares sequence similarity(e.g., at least 50%, at least 60%, at least 70%, at least 80%, at least90%, at least 95%, etc. sequence identity at the nucleotide level) toone or more element of a vector described herein. Derivative vectors mayalso share at least at least 50%, at least 60%, at least 70%, at least80%, at least 90%, at least 95%, etc. sequence identity at thenucleotide level to the complete nucleotide sequence of a vectordescribed herein. One example of a derivative vectors is the vectorrepresented in FIG. 26 after the ccdB/chloramphenicol resistancecassette has been replaced by another nucleic acid segment using arecombination reaction. Thus, derivative vectors include those whichhave been generated by performing a cloning reaction upon a vectordescribed herein. Derivative vectors also include vectors which havebeen generated by the insertion of elements of a vector described hereininto another vector. Often these derivative vectors will contain atleast 50%, at least 60%, at least 70%, at least 80%, at least 90%, atleast 95%, etc. of the nucleic acid present in a vector describedherein. Derivative vectors also include progeny of any of the vectorsreferred to above, as well as vectors referred to above which have beensubjected to mutagenesis (e.g., random mutagenesis). The inventionincludes vectors which are derivatives of vectors described herein, aswell as uses of these vector in various described methods andcompositions comprising these vectors.

Other terms used in the fields of recombinant nucleic acid technologyand molecular and cell biology as used herein will be generallyunderstood by one of ordinary skill in the applicable arts.

Overview

The present invention relates to methods, compositions and kits for therecombinational joining of two or more segments or nucleic acidmolecules or other molecules and/or compounds (or combinations thereof).The invention also relates to attaching such linked nucleic acidmolecules or other molecules and/or compounds to one or more supports orstructures preferably through recombination sites or portions thereof.Thus, the invention generally relates to linking any number of nucleicacids or other molecules and/or compounds via nucleic acid linkerscomprising one or more recombination sites (e.g., two, three, four,five, seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.) orportions thereof.

The linked products produced by the invention may comprise any number ofthe same or different nucleic acids or other molecules and/or compounds,depending on the starting materials. Such starting materials include,but are not limited to, any nucleic acids (or derivatives thereof suchas peptide nucleic acids (PNAs)), chemical compounds, detectably labeledmolecules (such as fluorescent molecules and chemiluminescentmolecules), drugs, peptides or proteins, lipids, carbohydrates and othermolecules and/or compounds comprising one or more recombination sites orportions thereof. Through recombination of such recombination sitesaccording to the invention, any number or combination of such startingmolecules and/or compounds can be linked to make linked products of theinvention. In addition, deletion or replacement of certain portions orcomponents of the linked products of the invention can be accomplishedby recombination.

In some embodiments, the joined segments may be inserted into adifferent nucleic acid molecule such as a vector, preferably byrecombinational cloning methods but also by homologous recombination.Thus, in some embodiments, the present invention relates to theconstruction of nucleic acid molecules (RNA or DNA) by combining two ormore segments of nucleic acid (e.g., two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, etc.) by a recombinationreaction and inserting the joined two or more segments into a vector byrecombinational cloning.

In embodiments where the joined nucleic acid molecules are to be furthercombined with an additional nucleic acid molecule by a recombinationreaction, the timing of the two recombination events, i.e., the joiningof the segments and the insertion of the segments into a vector, is notcritical. That is to say, it is not critical to the present invention,for example, whether the two or more nucleic acid segments are joinedtogether before insertion into the vector or whether one recombinationsite on each segment first reacts with a recombination site on thevector and subsequently the recombination sites on the nucleic acidsegments react with each other to join the segments. Moreover, thenucleic acid segments can be cloned in any one or a number of positionswithin the vector and do not need to be inserted adjacent to each other,although, in some embodiments, joining of two or more of such segmentswithin the vector is preferred.

In accordance with the invention, recombinational cloning allowsefficient selection and identification of molecules (particularlyvectors) containing the combined nucleic acid segments. Thus, two ormore nucleic acid segments of interest can be combined and, optionally,inserted into a single vector suitable for further manipulation of thecombined nucleic acid molecule.

In a fundamental embodiment, at least two nucleic acid segments, eachcomprising at least one recombination site, are contacted with suitablerecombination proteins to effect the joining of all or a portion of thetwo molecules, depending on the position of the recombination sites inthe molecules. Each individual nucleic acid segment may comprise avariety of sequences including, but not limited to sequences suitablefor use as primer sites (e.g., sequences for which a primer such as asequencing primer or amplification primer may hybridize to initiatenucleic acid synthesis, amplification or sequencing), transcription ortranslation signals or regulatory sequences such as promoters and/orenhancers, ribosomal binding sites, Kozak sequences, start codons,termination signals such as stop codons, origins of replication,recombination sites (or portions thereof), selectable markers, and genesor portions of genes to create protein fusions (e.g., N-terminal orC-terminal) such as GST, GUS, GFP, YFP, CFP, maltose binding protein, 6histidines (HIS6), epitopes, haptens and the like and combinationsthereof. The vectors used for cloning such segments may also comprisethese functional sequences (e.g., promoters, primer sites, etc.). Aftercombination of the segments comprising such sequences and optimally thecloning of the sequences into one or more vectors (e.g., two, three,four, five, seven, ten, twelve, fifteen, etc.), the molecules may bemanipulated in a variety of ways, including sequencing or amplificationof the target nucleic acid molecule (i.e., by using at least one of theprimer sites introduced by the integration sequence), mutation of thetarget nucleic acid molecule (i.e., by insertion, deletion orsubstitution in or on the target nucleic acid molecule), insertion intoanother molecule by homologous recombination, transcription of thetarget nucleic acid molecule, and protein expression from the targetnucleic acid molecule or portions thereof (i.e., by expression oftranslation and/or transcription signals contained by the segmentsand/or vectors).

The present invention also relates to the generation of combinatoriallibraries using the recombinational cloning methods disclosed. Thus, oneor more of the nucleic acid segments joined may comprise a nucleic acidlibrary. Such a library may comprise, for example, nucleic acidmolecules corresponding to permutations of a sequence coding for apeptide, polypeptide or protein sequence. The permutations can be joinedto another nucleic acid segment consisting of a single sequence or,alternatively, the second nucleic acid segment may also be a librarycorresponding to permutation of another peptide, polypeptide or proteinsequence such that joining of the two segments may produce a libraryrepresenting all possible combinations of all the permutations of thetwo peptide, polypeptide or proteins sequences. These nucleic acidsegments may be contiguous or non-contiguous. Numerous examples of theuse of combinatorial libraries are known in the art. (See, e.g.,Waterhouse, et al., Nucleic Acids Res., 1993, Vol. 21, No. 9, 2265-2266,Tsurushita, et al., Gene, 1996, Vol. 172 No. 1, 59-63, Persson, Int.Rev. Immunol. 1993 10:2-3 153-63, Chanock, et al., Infect Agents Dis1993 June 2:3 118-31, Burioni, et al., Res Virol 1997 March-April 148:2161-4, Leung, Thromb. Haemost. 1995 July 74:1373-6, Sandhu, Crit. Rev.Biotechnol. 1992, 12:5-6 437-62 and U.S. Pat. Nos. 5,733,743, 5,871,907and 5,858,657, all of which are specifically incorporated herein byreference.)

When one or more nucleic acid segments used in methods and compositionsof the invention are mutated, these segments may contain either (1) aspecified number of mutations or (2) an average specified number ofmutations. Further, these mutations may be scored with reference to thenucleic acid segments themselves or the expression products (e.g.,polypeptides of such nucleic acid segments. For example, nucleic acidmolecules of a library may be mutated to produce nucleic acid moleculeswhich are, on average, at least 50%, at least 55%, at least 60%, atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 90%, at least 95%, at least 96%, at least 97%, at least 98%, or atleast 99% identical to corresponding nucleic acid molecules of theoriginal library. Similarly, nucleic acid molecules of a library may bemutated to produce nucleic acid molecules which, encode polypeptidesthat are, on average, at least 50%, at least 55%, at least 60%, at least65%, at least 70%, at least 75%, at least 80%, at least 85%, at least90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least99% identical to polypeptides encoded by corresponding nucleic acidmolecules of the original library.

Recombination Sites

Recombination sites for use in the invention may be any nucleic acidthat can serve as a substrate in a recombination reaction. Suchrecombination sites may be wild-type or naturally occurringrecombination sites, or modified, variant, derivative, or mutantrecombination sites. Examples of recombination sites for use in theinvention include, but are not limited to, phage-lambda recombinationsites (such as attP, attB, attL, and attR and mutants or derivativesthereof) and recombination sites from other bacteriophage such as phi80,P22, P2, 186, P4 and P1 (including lox sites such as loxP and loxP511).Mutated att sites (e.g., attB1-10, attP1-10, attR1-10 and attL1-10) aredescribed in Example 9 below and in previous patent U.S. Appl. No.60/136,744, filed May 28, 1999, and U.S. application Ser. No.09/517,466, filed Mar. 2, 2000, which are specifically incorporatedherein by reference. Other recombination sites having unique specificity(i.e., a first site will recombine with its corresponding site and willnot recombine with a second site having a different specificity) areknown to those skilled in the art and may be used to practice thepresent invention. Corresponding recombination proteins for thesesystems may be used in accordance with the invention with the indicatedrecombination sites. Other systems providing recombination sites andrecombination proteins for use in the invention include the FLP/FRTsystem from Saccharomyces cerevisiae, the resolvase family (e.g., y5,TndX, TnpX, Tn3 resolvase, Hin, Hjc, Gin, SpCCE1, ParA, and Cin), andIS231 and other Bacillus thuringiensis transposable elements. Othersuitable recombination systems for use in the present invention includethe XerC and XerD recombinases and the psi, dif and cer recombinationsites in E. coli. Other suitable recombination sites may be found inU.S. Pat. No. 5,851,808 issued to Elledge and Liu which is specificallyincorporated herein by reference. Suitable recombination proteins andmutant, modified, variant, or derivative recombination sites for use inthe invention include those described in U.S. Pat. Nos. 5,888,732 and6,143,557, and in U.S. application Ser. No. 09/438,358 (filed Nov. 12,1999), based upon U.S. provisional application No. 60/108,324 (filedNov. 13, 1998), and U.S. application Ser. No. 09/517,466 (filed Mar. 2,2000), based upon U.S. provisional application No. 60/136,744 (filed May28, 1999), as well as those associated with the GATEWAY™ CloningTechnology and MultiSite Gateway Cloning Technology are available fromInvitrogen Corp. (Carlsbad, Calif.), the entire disclosures of all ofwhich are specifically incorporated herein by reference in theirentireties.

Representative examples of recombination sites which can be used in thepractice of the invention include att sites referred to above. Theinventors have determined that att sites which specifically recombinewith other att sites can be constructed by altering nucleotides in andnear the 7 base pair overlap region. Thus, recombination sites suitablefor use in the methods, compositions, and vectors of the inventioninclude, but are not limited to, those with insertions, deletions orsubstitutions of one, two, three, four, or more nucleotide bases withinthe base pair core region (GCTTTTTTATACTAA (SEQ ID NO:37)), which isidentical in all four wild-type lambda att sites, attB, attP, attL andattR (see U.S. application Ser. No. 08/663,002, filed Jun. 7, 1996 (nowU.S. Pat. No. 5,888,732) and 09/177,387, filed Oct. 23, 1998, whichdescribes the core region in further detail, and the disclosures ofwhich are incorporated herein by reference in their entireties).Recombination sites suitable for use in the methods, compositions, andvectors of the invention also include those with insertions, deletionsor substitutions of one, two, three, four, or more nucleotide baseswithin the 15 base pair core region (GCTTTTTTATACTAA (SEQ ID NO:37))which are at least 50% identical, at least 55% identical, at least 60%identical, at least 65% identical, at least 70% identical, at least 75%identical, at least 80% identical, at least 85% identical, at least 90%identical, or at least 95% identical to this 15 base pair core region.

Analogously, the core regions in attB1, attP1, attL1 and attR1 areidentical to one another, as are the core regions in attB2, attP2, attL2and attR2. Nucleic acid molecules suitable for use with the inventionalso include those which comprising insertions, deletions orsubstitutions of one, two, three, four, or more nucleotides within theseven base pair overlap region (TTTATAC, which is defined by the cutsites for the integrase protein and is the region where strand exchangetakes place) that occurs within this 15 base pair core region(GCTTTTTTATACTAA (SEQ ID NO:37)). Examples of such mutants, fragments,variants and derivatives include, but are not limited to, nucleic acidmolecules in which (1) the thymine at position 1 of the seven by overlapregion has been deleted or substituted with a guanine, cytosine, oradenine; (2) the thymine at position 2 of the seven by overlap regionhas been deleted or substituted with a guanine, cytosine, or adenine;(3) the thymine at position 3 of the seven by overlap region has beendeleted or substituted with a guanine, cytosine, or adenine; (4) theadenine at position 4 of the seven by overlap region has been deleted orsubstituted with a guanine, cytosine, or thymine; (5) the thymine atposition 5 of the seven by overlap region has been deleted orsubstituted with a guanine, cytosine, or adenine; (6) the adenine atposition 6 of the seven by overlap region has been deleted orsubstituted with a guanine, cytosine, or thymine; and (7) the cytosineat position 7 of the seven by overlap region has been deleted orsubstituted with a guanine, thymine, or adenine; or any combination ofone or more such deletions and/or substitutions within this seven byoverlap region. The nucleotide sequences of the exemplary seven basepair core region are set out below in Table 2.

The present invention also embodies the use of the recombination sitesattB3 and attB4 shown below in a MultiSite Gateway recombination cloningsystem:

(SEQ ID NO: 141) attB3 5′ CAACTTTGTATAATAAAGTTG 3′ (SEQ ID NO: 142)attB4 5′ CAACTTTGTATAGAAAAGTTG 3′

These attB sites, like attB1 and attB2 sites create sequence specificrecombination groups that do not recombine with non-like sequences. Thissequence specific recombination property of the attB sites confersdirectionality of cloning in standard Gateway cloning and directs theaccurate assembly of multiple fragments when cloning with MultiSiteGateway.

MultiSite Gateway is an extension of the Gateway site-specificrecombinational cloning system. The introduction of att sitespecificities attB3 and attB4 (in addition to attB1 and attB2 setspresently used in Gateway) allows the simultaneous cloning of multipleDNA fragments in a defined order and orientation. MultiSite Gatewayapplications are extensive and varied including but not limited to; theexpression of multiple gene products from a single vector, addition ofpromoter/tag elements to the ends of standard Gateway Entry Clones (attL1/L2), construction of gene-targeting vectors, engineering andshuffling of protein coding domains, construction of synthetic operons,biological and biochemical pathway engineering and genome engineering.

As in the present version of Gateway, to enter MultiSite Gateway, setsof Entry Clones are obtained or generated. Entry Clones are then simplymixed together with the appropriate MultiSite Gateway Destination Vectorin a single LR reaction that results in the simultaneous cloning ofmultiple fragments into the Destination Vector backbone. Thesite-specific recombination reactions are precise, efficient anddirectional resulting in all of the colonies recovered containing thedesired Expression Clone constructs. MultiSite Gateway Entry Clones canbe sequenced validated and serve as source clones in the assembly ofcomplex DNA constructions. This eliminates the need to sequence validatethe final assembled products. Further, each element of a constructassembly using MultiSite Gateway can be replaced by any other element,of similar recombinant ends, affording maximum flexibility in vectorconstruction.

As described below in Examples 9-12, altered att sites have beenconstructed which demonstrate that (1) substitutions made within thefirst three positions of the seven base pair overlap (TTTATAC) stronglyaffect the specificity of recombination, (2) substitutions made in thelast four positions (TTTATAC) only partially alter recombinationspecificity, and (3) nucleotide substitutions outside of the seven byoverlap, but elsewhere within the 15 base pair core region, do notaffect specificity of recombination but do influence the efficiency ofrecombination. Thus, nucleic acid molecules and methods of the inventioninclude those which comprising or employ one, two, three, four, five,six, eight, ten, or more recombination sites which affect recombinationspecificity, particularly one or more (e.g., one, two, three, four,five, six, eight, ten, twenty, thirty, forty, fifty, etc.) differentrecombination sites that may correspond substantially to the seven basepair overlap within the 15 base pair core region, having one or moremutations that affect recombination specificity. Particularly preferredsuch molecules may comprise a consensus sequence such as NNNATAC,wherein “N” refers to any nucleotide (i.e., may be A, G, T/U or C).Preferably, if one of the first three nucleotides in the consensussequence is a T/U, then at least one of the other two of the first threenucleotides is not a T/U.

The core sequence of each att site (attB, attP, attL and attR) can bedivided into functional units consisting of integrase binding sites,integrase cleavage sites and sequences that determine specificity. Asdiscussed below in Example 12, specificity determinants are defined bythe first three positions following the integrase top strand cleavagesite. These three positions are shown with underlining in the followingreference sequence: CAACTTTTTTATAC AAAGTTG (SEQ ID NO:38). Modificationof these three positions (64 possible combinations) which can be used togenerate att sites which recombine with high specificity with other attsites having the same sequence for the first three nucleotides of theseven base pair overlap region are shown in Table 1.

TABLE 1 Modifications of the First Three Nucleotidesof the att Site Seven Base Pair OverlapRegion which Alter Recombination Specificity. AAA CAA GAA TAA AAC CACGAC TAC AAG CAG GAG TAG AAT CAT GAT TAT ACA CCA GCA TCA ACC CCC GCC TCCACG CCG GCG TCG ACT CCT GCT TCT AGA CGA GGA TGA AGC CGC GGC TGC AGG CGGGGG TGG AGT CGT GGT TGT ATA CTA GTA TTA ATC CTC GTC TTC ATG CTG GTG TTGATT CTT GTT TTT

Representative examples of seven base pair att site overlap regionssuitable for in methods, compositions and vectors of the invention areshown in Table 2. The invention further includes nucleic acid moleculescomprising one or more (e.g., one, two, three, four, five, six, eight,ten, twenty, thirty, forty, fifty, etc.) nucleotides sequences set outin Table 2. Thus, for example, in one aspect, the invention providesnucleic acid molecules comprising the nucleotide sequence GAAATAC,GATATAC, ACAATAC, or TGCATAC. However, in certain embodiments, theinvention will not include nucleic acid molecules which comprise attsite core regions set out herein in FIGS. 24A-24C or in Example 9.

TABLE 2 Representative Examples of Seven Base Pairatt Site Overlap Regions Suitable for with the Invention. AAAATACCAAATAC GAAATAC TAAATAC AACATAC CACATAC GACATAC TACATAC AAGATAC CAGATACGAGATAC TAGATAC AATATAC CATATAC GATATAC TATATAC ACAATAC CCAATAC GCAATACTCAATAC ACCATAC CCCATAC GCCATAC TCCATAC ACGATAC CCGATAC GCGATAC TCGATACACTATAC CCTATAC GCTATAC TCTATAC AGAATAC CGAATAC GGAATAC TGAATAC AGCATACCGCATAC GGCATAC TGCATAC AGGATAC CGGATAC GGGATAC TGGATAC AGTATAC CGTATACGGTATAC TGTATAC ATAATAC CTAATAC GTAATAC TTAATAC ATCATAC CTCATAC GTCATACTTCATAC ATGATAC CTGATAC GTGATAC TTGATAC ATTATAC CTTATAC GTTATAC TTTATAC

As noted above, alterations of nucleotides located 3′ to the three basepair region discussed above can also affect recombination specificity.For example, alterations within the last four positions of the sevenbase pair overlap can also affect recombination specificity.

The invention thus provides recombination sites which recombine with acognate partner, as well as molecules which contain these recombinationsites and methods for generating, identifying, and using these sites.Methods which can be used to identify such sites are set out below inExample 12. Examples of such recombinations sites include att siteswhich contain 7 base pairs overlap regions which associate and recombinewith cognate partners. The nucleotide sequences of specific examples ofsuch 7 base pair overlap regions are set out above in Table 2.

Further embodiments of the invention include isolated nucleic acidmolecules comprising a nucleotide sequence at least 50% identical, atleast 60% identical, at least 70% identical, at least 75% identical, atleast 80% identical, at least 85% identical, at least 90% identical, orat least 95% identical to the nucleotide sequences of the seven byoverlap regions set out above in Table 2 or the 15 base pair core regionshown in SEQ ID NO:37, as well as a nucleotide sequence complementary toany of these nucleotide sequences or fragments, variants, mutants, andderivatives thereof. Additional embodiments of the invention includecompositions and vectors which contain these nucleic acid molecules, aswell as methods for using these nucleic acid molecules.

By a polynucleotide having a nucleotide sequence at least, for example,95% “identical” to a reference nucleotide sequence encoding a particularrecombination site or portion thereof is intended that the nucleotidesequence of the polynucleotide is identical to the reference sequenceexcept that the polynucleotide sequence may include up to five pointmutations (e.g., insertions, substitutions, or deletions) per each 100nucleotides of the reference nucleotide sequence encoding therecombination site. For example, to obtain a polynucleotide having anucleotide sequence at least 95% identical to a reference attB1nucleotide sequence (SEQ ID NO:5), up to 5% of the nucleotides in theattB1 reference sequence may be deleted or substituted with anothernucleotide, or a number of nucleotides up to 5% of the total nucleotidesin the attB1 reference sequence may be inserted into the attB1 referencesequence. These mutations of the reference sequence may occur at the 5′or 3′ terminal positions of the reference nucleotide sequence oranywhere between those terminal positions, interspersed eitherindividually among nucleotides in the reference sequence or in one ormore contiguous groups within the reference sequence.

As a practical matter, whether any particular nucleic acid molecule isat least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99%identical to, for instance, a given recombination site nucleotidesequence or portion thereof can be determined conventionally using knowncomputer programs such as DNAsis software (Hitachi Software, San Bruno,Calif.) for initial sequence alignment followed by ESEE version 3.0DNA/protein sequence software (cabot@trog.mbb.sfu.ca) for multiplesequence alignments. Alternatively, such determinations may beaccomplished using the BESTFIT program (Wisconsin Sequence AnalysisPackage, Genetics Computer Group, University Research Park, 575 ScienceDrive, Madison, Wis. 53711), which employs a local homology algorithm(Smith and Waterman, Advances in Applied Mathematics 2: 482-489 (1981))to find the best segment of homology between two sequences. When usingDNAsis, ESEE, BESTFIT or any other sequence alignment program todetermine whether a particular sequence is, for instance, 95% identicalto a reference sequence according to the present invention, theparameters are set such that the percentage of identity is calculatedover the full length of the reference nucleotide sequence and that gapsin homology of up to 5% of the total number of nucleotides in thereference sequence are allowed.

As noted above, the invention further provides, in one aspect, methodsfor constructing and/or identifying recombination sites suitable for usewith nucleic acid molecules of the invention, as well as recombinationsites constructed and/or identified by these methods. In brief, theinvention provides methods for constructing and/or identifyingrecombination sites which are capable of recombining with otherrecombination sites. For example, the invention provides methods forconstructing recombination sites and identifying whether theserecombination sites recombine with other recombination sites.Recombination sites which are screened for recombination activity andspecificity can be constructed by any number of means, includingsite-directed mutagenesis and random nucleic acid synthesis.

The invention further provides “single use” recombination sites whichundergo recombination one time and then either undergo recombinationwith low frequency (e.g., have at least five fold, at least ten fold, atleast fifty fold, at least one hundred fold, or at least one thousandfold lower recombination activity in subsequent recombination reactions)or are essentially incapable of undergo recombination. The inventionalso provides methods for making and using nucleic acid molecules whichcontain such single use recombination sites and molecules which containthese sites. Examples of methods which can be used to generate andidentify such single use recombination sites are set out below.

The att system core integrase binding site comprises an interruptedseven base pair inverted repeat having the following nucleotidesequence:

------>.......<------ caactttnnnnnnnaaagttg, (SEQ ID NO: 39)

as well as variations thereof which can comprise either perfect orimperfect repeats.

The repeat elements can be subdivided into two distal and/or proximal“domains” composed of caac/gttg segments (underlined), which are distalto the central undefined sequence (the nucleotides of which arerepresented by the letter “n”), and ttt/aaa segments, which are proximalto the central undefined sequence.

Alterations in the sequence composition of the distal and/or proximaldomains on one or both sides of the central undefined region can affectthe outcome of a recombination reaction. The scope and scale of theeffect is a function of the specific alterations made, as well as theparticular recombinational event (e.g., LR vs. BP reactions).

For example, it is believed that an attB site altered to have thefollowing nucleotide sequence:

------>.......<------ caactttnnnnnnnaaacaag, (SEQ ID NO: 40)

will functionally interact with a cognate attP and generate attL andattR. However, whichever of the latter two recombination sites acquiresthe segment containing “caag” (located on the left side of the sequenceshown above) will be rendered non-functional to subsequent recombinationevents. The above is only one of many possible alterations in the coreintegrase binding sequence which can render att sites non-functionalafter engaging in a single recombination event. Thus, single userecombination sites may be prepared by altering nucleotides in the sevenbase pair inverted repeat regions which abut seven base pair overlapregions of att sites. This region is represented schematically as:

CAAC TTT [Seven Base Pair Overlap Region] AAA GTTG

In generating single use recombination sites, one, two, three, four ormore of nucleotides of the sequences CAACTTT or AAAGTTG (i.e., the sevenbase pair inverted repeat regions) may be substituted with othernucleotides or deleted altogether. These seven base pair inverted repeatregions represent complementary sequences with respect to each other.Thus, alterations may be made in either seven base pair inverted repeatregion in order to generate single use recombination sites. Further,when DNA is double stranded and one seven base pair inverted repeatregion is present, the other seven base pair inverted repeat region willalso be present on the other strand.

Using the sequence CAACTTT for illustration, examples of seven base pairinverted repeat regions which can form single use recombination sitesinclude, but are not limited to, nucleic acid molecules in which (1) thecytosine at position 1 of the seven base pair inverted repeat region hasbeen deleted or substituted with a guanine, adenine, or thymine; (2) theadenine at position 2 of the seven base pair inverted repeat region hasbeen deleted or substituted with a guanine, cytosine, or thymine; (3)the adenine at position 3 of the seven base pair inverted repeat regionhas been deleted or substituted with a guanine, cytosine, or thymine;(4) the cytosine at position 4 of the seven base pair inverted repeatregion has been deleted or substituted with a guanine, adenine, orthymine; (5) the thymine at position 5 of the seven base pair invertedrepeat region has been deleted or substituted with a guanine, cytosine,or adenine; (6) the thymine at position 6 of the seven base pairinverted repeat region has been deleted or substituted with a guanine,cytosine, or adenine; and (7) the thymine at position 7 of the sevenbase pair inverted repeat region has been deleted or substituted with aguanine, cytosine, or adenine; or any combination of one, two, three,four, or more such deletions and/or substitutions within this seven basepair region. Representative examples of nucleotide sequences of theabove described seven base pair inverted repeat regions are set outbelow in Table 3.

TABLE 3 aagaaaa aagagcg aagagaa aagatat ccgccac ccgcctc ccgcaca ccgctttggtggga ggtgctc ggtgata ggtgtat ttctttg ttctctc ttctgaa ttcttttaatacac aatagcg aataaca aatatat cctcgga cctcccg cctcaca cctctttggcgaaa ggcgccg ggcggaa ggcgtat ttgtcac ttgtgcg ttgtaca ttgttttacaagga acaaccg acaaata acaattt caccttg caccaga caccgaa cacctatgaggcac gagggcg gaggaca gaggttt tattgga tattaga tattaca tatttatagaaaaa agaaaga agaagaa agaattt cgcccac cgccctc cgccaca cgcctttgcgggga gcgggcg gcggata gcggtat tcttttg tcttccg tcttgaa tctttttataacac ataactc ataaaca ataattt ctccaaa ctccgcg ctccata ctcctatgtgggga gtggccg gtgggaa gtggtat tgttttg tgttctc tgttaca tgttttt

Representative examples of nucleotide sequences which form single userecombination sites may also be prepared by combining a nucleotidesequence set out in Table 4, Section 1, with a nucleotide sequence setout in Table 4, Section 2. Single use recombination sites may also beprepared by the insertion of one or more (e.g., one, two, three, four,five six, seven, etc.) nucleotides internally within these regions.

TABLE 4 Section 1 (CAAG) Section 2 (TTT) aaaa cccc gggg tttt aaa cca ttcaaac ccca ggga ttta aac cac ttg aaag ccct gggc tttc aag cgc tataaat cccg gggt tttg aat ctc tct aaca ccac ggag ttat aca ggg tgtaaga ccgc ggtg ttct aga gga aata cctc ggcg ttgt ata ggcacaa cacc gagg tatt caa ggt agaa cgcc gcgg tctt gaa gagataa ctcc gtgg tgtt taa gcg caaa accc aggg attt ccc gtggaaa gccc CGG cttt ccg ttt taaa tccc tggg gttt cct tta

In most instances where one seeks to prevent recombination events withrespect to a particular nucleic acid segment, the altered sequence willbe located proximally to the nucleic acid segment. Using the followingschematic for illustration:

=5′ Nucleic Acid Segment 3′=caac ttt [Seven Base Pair Overlap Region]AAA GTTG, the lower case nucleotide sequence which represent a sevenbase pair inverted repeat region (i.e., caac ttt) will generally have asequence altered by insertion, deletion, and/or substitution to form asingle use recombination site when one seeks to prevent recombination atthe 3′ end (i.e., proximal end with respect to the nucleic acid segment)of the nucleic acid segment shown. Thus, a single recombination reactioncan be used, for example, to integrate the nucleic acid segments intoanother nucleic acid molecule, then the recombination site becomeseffectively non-functional, preventing the site from engaging in furtherrecombination reactions. Similarly, single use recombination sites canbe position at both ends of a nucleic acid segment so that the nucleicacid segment can be integrated into another nucleic acid molecule, orcircularized, and will remain integrated, or circularized even in thepresence of recombinases.

A number of methods may be used to screen potential single userecombination sites for functional activity (e.g., undergo onerecombination event followed by the failure to undergo subsequentrecombination events). For example, with respect to the screening ofrecombination sites to identify those which become non-functional aftera single recombination event, a first recombination reaction may beperformed to generate a plasmid in which a negative selection marker islinked to one or more potentially defective recombination sites. Theplasmid may then be reacted with another nucleic acid molecule whichcomprises a positive selection marker similarly linked to recombinationsites. Thus, this selection system is designed such that molecules whichrecombine are susceptible to negative selection and molecules which donot recombine may be selected fro by positive selection. Using such asystem, one may then directly select for desired single use core sitemutants.

As one skilled in the art would recognize, any number of screeningassays may be designed which achieve the same results as those describedabove. In many instances, these assays will be designed so that aninitial recombination event takes place and then recombination siteswhich are unable to engage in subsequent recombination events areidentified or molecules which contain such recombination sites areselected for. A related screening assay would result in selectionagainst nucleic acid molecule which have undergone a secondrecombination event. Further, as noted above, screening assays can bedesigned where there is selection against molecules which have engagedin subsequent recombination events and selection for those which havenot engaged in subsequent recombination events.

Single use recombination sites are especially useful for eitherdecreasing the frequency of or preventing recombination when eitherlarge number of nucleic acid segments are attached to each other ormultiple recombination reactions are performed. Thus, the inventionfurther includes nucleic acid molecules which contain single userecombination sites, as well as methods for performing recombinationusing these sites.

Construction and Uses Nucleic Acid Molecules of the Invention

As discussed below in more detail, in one aspect, the invention providesa modular system for constructing nucleic acid molecules havingparticular functions or activities. The invention further providesmethods for combining populations of nucleic acid molecules with one ormore known or unknown target sequences of interest (e.g., two, three,four, five seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.) orwith other populations of nucleic acid molecules (known or unknown),thereby creating populations of combinatorial molecules (e.g.,combinatorial libraries) from which unique and/or novel molecules (e.g.,hybrid molecules) and proteins or peptides encoded by these moleculesmay be obtained and further analyzed.

The present invention also includes methods for preparing vectorscontaining more than one nucleic acid insert (e.g., two, three, four,five, six, eight, ten, twelve, fifteen, twenty, thirty, forty, fifty,etc. inserts). In one general embodiment of the invention, vectors ofthe invention are prepared as follows. Nucleic acid molecules which areto ultimately be inserted into the Destination Vector are obtained(e.g., purchased, prepared by PCR or by the preparation of cDNA usingreverse transcriptase). Suitable recombination sites are eitherincorporated into the 5′ and 3′ ends of the nucleic acid moleculesduring synthesis or added later. When one seeks to prepare a vectorcontaining multiple nucleic acid inserts, these inserts can be insertedinto a vector in either one reaction mixture or a series of reactionmixtures. For example, as shown in FIG. 16, multiple nucleic acidsegments can be linked end to end and inserted into a vector usingreactions performed, for example, in a single reaction mixture. Thenucleic acid segments in this reaction mixture can be designed so thatrecombination sites on their 5′ and 3′ ends result in their insertioninto a Destination Vector in a specific order and a specific 5′ to 3′orientation. Alternatively, nucleic acid segments can be designed sothat they are inserted into a Destination Vector without regard toorder, orientation (i.e., 5′ to 3′ orientation), the number of inserts,and/or the number of duplicate inserts.

Further, in some instances, one or more of the nucleic acid segmentswill have a recombination site on only one end. Also, if desired, thisend, or these ends, may be linked to other nucleic acid segments by theuse of, for example, ligases or topoisomerases. As an example, a linearnucleic acid molecule with an attR1 site on its 5′ terminus can berecombined with a Destination Vector containing a ccdB gene flanked byan attL1 site and an attL2 site. Before, during, or after an LRreaction, the Destination Vector can be cut, for example, by arestriction enzyme on the side of the attR2 site which is opposite tothe ccdB gene. Thus, the Destination Vector will be linear after beingcut and undergoing recombination. Further, the attR1 site of the nucleicacid molecule will undergo recombination with the attL1 site of theDestination Vector to produce a linear vector which contains the nucleicacid molecule. The resulting linear product can then be circularizedusing an enzyme such as a ligase or topoisomerases.

Using the embodiment shown in FIG. 16 to exemplify another aspect of theinvention, a first DNA segment having an attL1 site at the 5′ end and anattL3 site at the 3′ end is attached by recombination to a second DNAsegment having an attR3 site at the 5′ end and an attL4 site at the 3′end. A third DNA segment having an attR4 site at the 5′ end and an attL5site at the 3′ end is attached by recombination with the attL4 site onthe 3′ end of the second DNA segment. A fourth DNA segment having anattR5 site at the 5′ end and an attL2 site at the 3′ end is attached byrecombination with the attL5 site on the 3′ end of the third DNAsegment. The Destination Vector contains an attR1 site and an attR2 sitewhich flanks a ccdB gene. Thus, upon reaction with LR CLONASE™, thefirst, second, third, and fourth DNA segments are inserted into theinsertion vector but are flanked or separated by attB1, attB3, attB4,attB5, and attB2 sites. A similar process involving assembly of the luxoperon is shown in FIGS. 17A-17B and described below in Example 18.

As one skilled in the art would recognize, multiple variations of theprocess shown in FIG. 16 are possible. For example, various combinationsof attB, attP, attL, and attR sites, as well as other recombinationsites, can be used. Similarly, various selection markers, origins ofreplication, promoters, and other genetic elements can be used. Further,regions which allow for integration into eukaryotic chromosomes (e.g.,transposable elements) can be added to these vectors.

One example of a multi-reaction process for inserting multiple DNAsegments into a vector is shown in FIG. 18. In this exemplaryembodiment, three DNA segments recombine with each other in two separatereaction mixtures. The products generated in these mixtures are thenmixed together under conditions which facilitate both recombinationbetween the products of the two reaction mixtures and insertion of thelinked product into a vector (e.g., a Destination Vector). Thisembodiment has the advantages that the (1) DNA segments can be inserteddirectly into a Destination Vector without prior insertion into anothervector, and (2) the same att sites, as well as other recombinationsites, can be used to prepare each of the linked DNA segments forinsertion into the vector.

As one skilled in the art would recognize, multiple variations of theprocesses described herein are possible. For example, single userecombination sites can be used to connect individual nucleic acidsegments. Thus, eliminating or reducing potential problems associatedwith arrays of nucleic acid segments engaging in undesired recombinationreactions. Further, the processes described above can be used to connectlarge numbers of individual nucleic acid molecules together in a varyingways. For example, nucleic acid segments can be connected randomly, orin a specified order, both with or without regard to 5′ to 3′orientation of the segments.

Further, identical copies of one or more nucleic acid segments can beincorporated into another nucleic acid molecule. Thus, the inventionalso provides nucleic acid molecules which contain multiple copies of asingle nucleic acid segment. Further, the selection of recombinationsites positioned at the 5′ and 3′ ends of these segments can be used todetermine the exact number of identical nucleic acid segments which areconnected and then inserted, for example, into a vector. Such vectorsmay then be inserted into a host cell where they can, for example,replicate autonomously or integrate into one or more nucleic acidmolecules which normally reside in the host cell (e.g., integrate bysite-specific recombination or homologous recombination).

Nucleic acid molecules which contain multiple copies of a nucleic acidsegment may be used, for example, to amplify the copy number of aparticular gene. Thus, the invention also provides methods for geneamplification, nucleic acid molecules which contain multiple copies of anucleic acid segment, and host cells which contain nucleic acidmolecules of the invention.

As another example, two different nucleic acid segments can be connectedusing processes of the invention. Recombination sites can be positionedon these segments, for example, such that the segments alternate uponattachment (e.g., Segment A+Segment B+Segment A+Segment B, etc.). Anucleic acid molecule having such a structure will be especially usefulfor when one seeks to use increased copy number of a nucleic acid toincrease the amount of expression product produced. In such an instance,“Segment A” can be, for example, a nucleic acid molecule comprising aninducible promoter and “Segment B” can be, for example, a nucleic acidmolecule comprising an ORF. Thus, cells can be prepared which containthe above construct and do not express substantial quantities of theproduct of Segment B in the absence of the inducing signal but producehigh levels of this product upon induction. Such a system will beespecially useful when the Segment B expression product is toxic tocells. Thus, the methods set out above can be used for the constructionand maintenance of cells which contain Segment B in the absence ofdeleterious effects resulting from the Segment B expression product.Further, induction of expression of the ORF residing in Segment B canthen be used, for example, to transiently produce high levels of theSegment B expression product.

Another example of a multi-step process for inserting multiple DNAsegments into a vector is shown in FIG. 19. In this embodiment, threeDNA segments are linked to each other in separate recombinationreactions and then inserted into separate vectors using LR and BPCLONASE™ reactions. After construction of these two vectors, theinserted DNA segments are transferred to another vector using an LRreaction. This results in all six DNA segments being inserted into asingle Destination Vector. As one skilled in the art would recognize,numerous variations of the process shown in FIG. 19 are possible and areincluded within the scope of the invention.

The number of genes which may be connected using methods of theinvention in a single step will in general be limited by the number ofrecombination sites with different specificities which can be used.Further, as described above and represented schematically in FIGS. 18and 19, recombination sites can be chosen so as to link nucleic acidsegments in one reaction and not engage recombination in laterreactions. For example, again using the process set out in FIG. 18 forreference, a series of concatamers of ordered nucleic acid segments canbe prepared using attL and attR sites and LR Clonase™. These concatamerscan then be connected to each other and, optionally, other nucleic acidmolecules using another LR reaction. Numerous variations of this processare possible.

Similarly, single use recombination sites may be used to prevent nucleicacid segments, once incorporated into another nucleic acid molecule,from engaging in subsequent recombination reactions. The use of singleuse recombination sites allows for the production of nucleic acidmolecules prepared from an essentially limitless number of individualnucleic acid segments.

In one aspect, the invention further provides method for combiningnucleic acid molecules in a single population with each other or withother molecules or populations of molecules, thereby creatingpopulations of combinatorial molecules from which unique and/or novelmolecules (e.g., hybrid molecules) and proteins or peptides encoded bythese molecules may also be obtained and further analyzed. The inventionfurther provides methods for screening populations of nucleic acidmolecules to identify those which have particular activities or whichencode expression products (e.g. RNAs or polypeptides) which haveparticular activities. Thus, methods of the invention can be used tocombine nucleic acid segments which encode functional domains (e.g., SH₃domains, antibody binding sites, transmembrane domains, signal peptides,enzymatic active sites) in various combinations with each other and toidentify products of these methods which have particular activities.

For example, nucleic acid segments which contain transcriptionalregulatory sequences can be identified by the following methods. Thenucleic acid molecules of a genomic DNA library are modified to containrecombination sites on their 5′ and 3′ termini. These nucleic acidmolecules are then inserted into a Destination Vector such that they arelocated 5′ to a selectable marker. Thus, expression of the selectablemarker will occur in vectors where the marker is in operable linkagewith a nucleic acid molecule which activates its transcription. Theinvention thus further provides isolated nucleic acid molecules whichare capable of activating transcription. In many instances, thesenucleic acid molecules which activate transcription will be identifiedusing methods and/or compositions of the invention.

Further, because some transcriptional regulatory sequences activate geneexpression in a tissue-specific manner, methods of the invention can beused to identify tissue-specific transcriptional regulatory sequences.For example, when one seeks to identify transcriptional regulatorysequences which activate transcription in a specific cell or tissuetype, the above screening process can be performed in cells of that cellor tissue type. Similarly, when one seeks to identify regulatorysequences which activate transcription in cells at a particular time, ata particular stage of development, or incubated under particularconditions (e.g., at a particular temperature), the above screeningprocess can be performed in cells at an appropriate time, at theparticular stage of development or incubated under the particularconditions. Once a sequence which activates transcription has beenidentified using such methods, the transcriptional regulatory sequencescan then be tested to determine if it is capable of activatingtranscription in other cells types or under conditions other than thosewhich resulted in its identification and/or selection. Thus, in onegeneral aspect, the invention provides methods for constructing and/oridentifying transcriptional regulatory sequences, as well as nucleicacid molecules which contain transcriptional regulatory sequencesidentified by methods of the invention in operable linkage with nucleicacid segments which encode expression products and methods for preparingsuch molecules.

Methods similar to those described above can also be used to identifyorigins of replication. Thus, the invention further includes methods foridentifying nucleic acid molecules which contain origins of replication,as well as nucleic acid molecules which contain origins of replicationidentified by methods of the invention and methods for preparing suchmolecules.

As discussed below in Example 1, the invention is thus particularlysuited for the construction of combinatorial libraries. For example,methods of the invention can also be used to “shuffle” nucleic acidmolecules which encode domains and regions of proteins to generate newnucleic acid molecules which can be used to express proteins havingspecific properties or activities. In such embodiments, nucleic acidsegments which encode portions of proteins are joined and then screenedfor one or more properties or activities.

The nucleic acid segments in these combinatorial libraries may beprepared by any number of methods, including reverse transcription ofmRNA. Altered forms of the nucleic acid segments in these libraries maybe generated using methods such as error prone PCR. In manyapplications, it will be desirable for the nucleic acid segments inthese libraries to encode subportions of protein. When this is the case,the methods can be adjusted to generate populations of nucleic acidsegments the majority of which do not contain full length ORFs. This canbe done, for example, by shearing the cDNA library and then separatingthe sheared molecules (e.g., using polyacrylamide or agarose gelelectrophoresis). Fragments between, for example, 300 and 600nucleotides in length (fragments which potentially encode 100 to 200amino acid residues) may then be recombined and inserted into a vectorin operable linkage with a transcriptional regulatory sequence.Polypeptide expression products of the individual members of such acombinatorial library may then be screened to identify those withparticular properties or activities.

The invention further provides methods for producing combinatoriallibraries generated using exon nucleic acid derived from genomic DNA.Intron/exon splice boundaries are known in the art; thus the locationsof exons in genomic DNA can be identified using routine, art-knownmethods without undue experimentation. Further, primers corresponding tointron/exon splice boundaries can be used to generate nucleic acidmolecules which correspond to exon sequences. Further, these nucleicacid molecules may then be connected to each other to generatecombinatorial libraries comprising nucleic acid molecules whichcorrespond to exon sequences. For example, primers corresponding tointron/exon splice boundaries can be used to generate nucleic acidmolecules which correspond to exon sequences using PCR. Recombinationsites may then be added to the termini of the resulting PCR productsusing ligases or amplifying the sequences using primers containingrecombination sites. The PCR products may then be connected to eachother using recombination reactions and inserted into an expressionvector. The resulting combinatorial library may then be screened toidentify nucleic acid molecules which, for example, encode polypeptideshaving particular functions or activities. Further, recombination sitesin expression products (e.g., RNA or protein) of nucleic acid moleculesof the combinatorial library can be removed by splicing as describedelsewhere herein.

Further, nucleic acid molecules used to produce combinatorial libraries,as well as the combinatorial libraries themselves, may be mutated toproduce nucleic acid molecules which are, on average, at least 50%, atleast 55%, at least 60%, at least 65%, at least 70%, at least 75%, atleast 80%, at least 85%, at least 90%, at least 95%, at least 96%, atleast 97%, at least 98%, or at least 99% identical to the correspondingoriginal nucleic acid molecules. Similarly, nucleic acid molecules usedto produce combinatorial libraries may be mutated to produce nucleicacid molecules which, encode polypeptides that are, on average, are atleast 50%, at least 55%, at least 60%, at least 65%, at least 70%, atleast 75%, at least 80%, at least 85%, at least 90%, at least 95%, atleast 96%, at least 97%, at least 98%, or at least 99% identical topolypeptides encoded by the corresponding original nucleic acidmolecules.

In one aspect the invention provides methods for generating andidentifying dominant/negative suppressors of biological processes orbiological pathways. For example, combinatorial libraries describedabove can be screened for dominant/negative activity. In general,dominant/negative activity results in the suppression of a biologicalprocess or biological pathway. In most instances, dominant/negativesuppressors exhibit their affects through interaction with cellularcomponents. For example, many dominant/negative suppressors containdomains having binding activities associated with one or more cellularproteins but do not have other activities associated with the cellularproteins. While not intending to be bound by theory, upon expression ina cell, dominant/negative suppressors generally interact with one ormore cellular ligands and block activation by cellular proteins. Thus,one mechanism by which dominant/negative suppressors are believed tointerfere with normal cellular processes is by ligand sequestration.

Dominant/negative activity can be conferred by mutations in a wild-typeprotein such as an alteration of a single amino acid residue or adeletion of an entire region of the protein. Oury et al., J. Biol. Chem.275:22611-22614 (2000), for example, describe a dominant/negativereceptor where dominant/negative activity results from the deletion of asingle amino acid residue.

Protein fragments can also have dominant/negative activity. For example,McNellis et al., Plant Cell 8:1491-1503 (1996), describe an N-terminalfragment of constitutive photomorphogenic 1 protein (COP1) which hasdominant/negative activity when expressed in Arabidopsis seedlings.

Any number of assays can be used to screen for dominant/negativeactivities. Maemura et al., J. Biol. Chem. 274:31565-31570 (1999), forexample, describe a deletion mutant of a transcription factor referredto as endothelial PAS domain protein 1 (EPASI) which hasdominant/negative activity. In particular, Maemura et al. demonstratedthat expression of the EPAS1 mutant in cells inhibits induction of VEGFmRNA production, an activity associated with wild-type EPAS 1.

The invention also provides methods for identifying nucleic acidmolecules which encode polypeptides having particular functions oractivities, as well as nucleic acid molecules produced by these methods,expression products of these nucleic acid molecules, and host cellswhich contain these nucleic acid molecules. Such functions or activitiesinclude secretion from cells, enzymatic activities, ligand bindingactivities (e.g., binding affinity for metal ions, cell surfacereceptors, nucleic acids, soluble proteins), and the ability to targetthe expression product to a sub-cellular localization (e.g.,localization to mitochondria, chloroplasts, endoplasmic reticulum,etc.). Assays for identifying these nucleic acid molecules willgenerally be designed to identify the function of activity associatedwith the polypeptide.

The invention also provides methods for identifying nucleic acidmolecules which encode polypeptides having regions which interact withother polypeptides. One example of such a method involves the use of twohybrid assays. (See, e.g., Fields et al., U.S. Pat. No. 5,667,973, theentire disclosure of which is incorporated herein by reference.) Morespecifically, nucleic acid molecules can be prepared using methods ofthe invention which encode a fusion protein between a polypeptide (e.g.,a Gal4N-terminal domain) that exhibits a particular function when inclose proximity with another polypeptide (e.g., a Gal4C-terminal domain)and protein or region of a protein for which a ligand is sought. Othernucleic acid molecules are then prepared which encode fusions betweenthe other polypeptide referred to in the previous sentence and proteinsegments encoded by a combinatorial library. Thus, nucleic acid segmentsin the combinatorial library which encode desired ligands can beidentified by screening for activities associated conferred by bringingthe two polypeptides into close proximity with each other.

Phage and bacterial surface display libraries may also be generated bymethods of the invention to identify domains which have particularfunctional activities (e.g., binding activity for a particular ligand).For example, Kim et al., Appl. Environ. Microbiol. 66:788-793 (2000),describe a bacterial surface display method for the selectivelyscreening for improved variants of carboxymethyl cellulase (CMCase).According to this method, a library of mutated CMCase genes is generatedby DNA shuffling and fused to the ice nucleation protein (Inp) gene,which results in the fusion proteins being displayed on the bacterialcell surface.

The invention thus provide methods for identifying nucleic acid segmentswhich encode proteins or protein regions that interact with otherproteins or have particular functional activities, as well as nucleicacid segments identified by such methods and polypeptide expressionproducts of these nucleic acid segments. In one aspect, methods of theinvention involve generating combinatorial libraries and screening theselibraries to identify individual nucleic acid molecules which encodeexpression products that interact with a particular protein or have aparticular activity. In many instances, the combinatorial librariesdescribed above will encode fusion proteins.

Thus, methods of the invention can be used to prepare and identifynucleic acid molecules which encode proteins and protein variants havingparticular properties, functions or activities. One example of a proteinproperty which is readily assayable is solubility. For example,fluorescence generated by GFP is quenched when an insoluble GFP fusionprotein is produced. Further, alterations in a relatively small numberof amino acid residues of a protein (e.g., one, two, three, four, etc.),when appropriately positioned, can alter the solubility of that protein.Thus, combinatorial libraries which express GFP fusion proteins can beused to isolate proteins and protein variants which have alteredsolubility. In one specific example, a combinatorial library designed toexpress GFP fused with variants of a single, insoluble polypeptide canbe used to isolate nucleic acid molecules which encode soluble variantsof the polypeptide.

Methods of the invention can be used to construct nucleic acid moleculeswhich contain two or more nucleic acid segments, wherein expression onenucleic acid segment is facilitated by the expression product of one ofthe other nucleic acid segments. For example, one nucleic acid segmentmay be operably linked to a T7 polymerase promoter and another nucleicacid segments encodes a T7 polymerase. Thus, the nucleic acid segmentoperably linked to the T7 polymerase promoter will be expressed uponexpression of the T7 polymerase. Numerous variations of such systemsfall within the scope of the invention. For example, nucleic acidencoding components or having particular activities referred to abovecan reside in a vector into which one or more the nucleic acid segmentsare inserted.

Methods of the invention can also be used to construct nucleic acidmolecules which encode more than one subunit of a multi-subunit enzyme.Further, expression of each of the subunits of this enzyme may beregulated by the same promoter or different promoter. When the samepromoter is used to drive expression of nucleic acid which encode two ormore proteins, the mRNA may contain, for example one or more internalribosome entry sites (IRES) which allow for translation of proteinencoded by RNA which is 3′ to the 5′ most coding sequence.

Methods of the invention can be used to construct nucleic acid moleculesand cells which contain a wide variety of specific inserts. Thus, in oneaspect, methods of the invention can be used to prepare nucleic acidmolecules and cells which contain multiple genes encode specificproducts. These methods allow for the generation of nucleic acidmolecules and organisms which have specific characteristics. Forexample, as discussed below in Example 18, nucleic acids which containall of the genes involved in a particular biological pathway can beprepared. Such genes may each be linked to different transcriptionalregulatory sequences or one or more copies of the same transcriptionalregulatory sequence. In addition, genes involved in the same ordifferent biological pathways or biological processes may be operablylinked to transcriptional regulatory sequences which facilitatetranscription in the presence of the same or different inducing agents,under the same or different environmental conditions (e.g.,temperature), or in the same or different cell types. Further, whengenes encode polypeptide expression products involved in a pathway orprocess, one or more of these expression products may be expressed asfusion proteins. Additionally, cells can be constructed using methods ofthe invention which contain inserted nucleic acid segments that encodegene products involved in more than one different biological pathway orbiological process.

One may also use methods of the invention, for example, to modify one ormore particular nucleic acid segments in a multi-nucleic acid segmentarray constructed with a multisite recombination system. Using the luxoperon construct shown in FIG. 17B for illustration, where each gene isflanked by attB sites having different recombination specificities, oneor more specific nucleic acid segments in the molecule may besubstituted with another nucleic acid segment. For example, the secondcoding region in the lux operon construct shown in FIG. 17B, luxD, canbe replaced by reacting the vector containing the operon with anappropriate plasmid (e.g., a pDONR plasmid), such that luxD issubstituted with an element comprising attRx-ccdB-cat-attRy to create avector (i.e., an output construct) wherein the locus previously occupiedby luxD becomes an acceptor site for Entry clones with anattLx-gene-attLy configuration. The product vector may then be reactedwith an attLx-gene-attLy Entry clone, which will result in thereplacement of the attRx-ccd/B-cat-attRy cassette with the new geneflanked by attBx and attBy. In related embodiments, populations of Entryclones with the general configuration of attLx-gene-attLy may be reactedwith the product vector, prepared as described above, such that apopulation of output constructs is generated and for any given constructin the population the segment comprising attRx-ccdB-cat-attRy will havebeen replaced by another nucleic acid segment flanked by attBx andattBy. In any given output construct within the population, theattRx-ccdB-cat-attRy cassette will have been replaced by a new geneflanked by attBx and attBy. Thus, the composition of a given nucleicacid segment array can be permuted in a parallel manner, while othergenes in the operon construct remain substantially unaffected by thesemanipulations.

Further, nucleic acids segments which encode expression productsinvolved in one or more specific biological processes or pathways may berecombined on supports. For example, a first nucleic acid molecule whichhas a free end on which there is a recombination site and encodes one ofthree enzymes involved in a biological pathway or process can beattached to a support. Nucleic acid molecules of a library havingrecombination sites on at least one end which are capable of recombiningwith the nucleic acid molecule attached to the support can then becontacted with the support under conditions which facilitaterecombination, leading to the attachment of a second nucleic acidmolecule to the first nucleic acid molecule. A similar process can beused to attached a third nucleic acid molecule to the free end of thesecond nucleic acid molecule. These resulting nucleic acid products maythen be either released from the support prior to assaying forbiological activity or such assaying may be performed while the nucleicacid products remain attached the support. Examples of assays which canbe performed are hybridization assays to detect whether specific nucleicacid molecules are present, assays for polypeptide expression productsof the connected nucleic acid molecules, or assays for end productsproduced by the polypeptide expression products (e.g., taxol, aminoacids, carbohydrates, etc.) of the connected nucleic acid molecules.

In embodiments related to the above, nucleic acid segments may be cycledon and off the supports described above. Thus, after a second nucleicacid molecule has recombined with the first nucleic acid molecule, asecond recombination reaction, for example, could be used to release thesecond nucleic acid molecule.

Thus, in one aspect, the invention provides methods for performingrecombination between nucleic acid molecules wherein at least one of thenucleic acid molecules is bound to a support. The invention furtherprovides methods for identifying nucleic acid molecules involved in thesame biological process or pathway by recombining these nucleic acidmolecules on supports (e.g., solid and semi-solid supports). Theinvention thus provides methods for screening nucleic acid libraries toidentify nucleic acid molecules which encode expression productsinvolved in particular biological processes or pathways, as well asnucleic acid molecules identified by these methods, expressions productsproduced from the nucleic acid molecules, and products produced by thesebiological processes or pathways.

The phrases “biochemical pathway” and “biological pathway” refer to anyseries of related biochemical reactions that are carried out by anorganism or cell. Such pathways may include but are not limited tobiosynthetic or biodegradation pathways, or pathways of energygeneration or conversion.

Nucleic acid molecules of the invention can be used for a wide varietyof applications. For example, methods of the invention can be used toprepare Destination Vectors which contain all of the structural genes ofan operon. As discussed below in Example 18 the lux operon has beenreconstructed using nucleic acids encoding the luxCDABE genes obtainedfrom the bioluminescent bacterium Vibrio fischeri.

Further, as noted above, expression products of nucleic acid moleculesof the invention, including multiple proteins which are part of the sameor different biological pathway or process, can be produced as fusionproteins. These fusion proteins may contain amino acids which facilitatepurification (e.g., 6 His tag), “target” the fusion protein to aparticular cellular compartment (e.g., a signal peptide), facilitatesolubility (e.g., maltose binding protein), and/or alter thecharacteristics of the expression product of the cloned gene (e.g., theFc portion of an antibody molecule, a green flourescent protein (GFP), ayellow fluorescent protein (YFP), or a cyan flourescent protein (CFP)).

Methods of the invention can also be used to prepare nucleic acidmolecule which, upon expression, produce fusion proteins having morethan one property, function, or activity. One example of such a nucleicacid molecule is a molecule which encodes a three component fusionprotein comprising a polypeptide of interest, Domain II of Pseudomonasexotoxin, and a polypeptide which promotes binding of the fusion proteinto a cell type of interest. Domain II of Pseudomonas exotoxin oftenconfers upon fusion proteins the ability to translocate across cellmembranes. Thus, the expression product could be designed so that itboth localizes to a particular cell-type and crosses the cell membrane.An expression product of this type would be especially useful when, forexample, the polypeptide of interest is cytotoxic (e.g., inducedapoptosis). Nucleic acid molecules which encode proteins similar tothose described above are described in Pastan et al., U.S. Pat. No.5,328,984.

Further, the expression product can be produced in such a manner as tofacilitate its export from the cell. For example, these expressionproducts can be fusion proteins which contain a signal peptide whichresults in export of the protein from the cell. One application wherecell export may be desirable is where the proteins that are to beexported are enzymes which interact with extracellular substrates.

In one aspect, the invention provides methods for preparing nucleic acidmolecules which encode one or more expression products involved in thesame or different biological pathway or process, as well as cells whichcontain these nucleic acid molecules and the resulting products of suchbiological pathways or processes. For example, methods of the inventioncan be used to construct cells which export multiple proteins involvedin the same or different biological processes. Thus, in one aspect, theinvention provides a system for cloning multiple nucleic acid segmentsin a cell, which export one or more gene products of the expressionproducts of these nucleic acid segments (e.g., two, three, four, five,seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.). Further,these expression products may perform functions (e.g., catalyze chemicalreactions) in extracellular media (e.g., culture media, soils, saltwater marshes, etc.).

When nucleic acid molecules are prepared and/or expressed using methodsof the invention, these nucleic acid molecules may encode expressionproducts which are involved in the same or different processes (e.g.,biosynthetic pathways, degradation pathway). As explained below, whenone seeks to provide a wide range of functional characteristics to anorganism, the nucleic acid molecules may encode expression productswhich confer relatively unrelated properties upon the organism.

Further, nucleic acid molecules can be prepared using methods of theinvention which encode all of parts of biosynthetic pathways that leadto desired end products. Further, methods of the invention can be usedto generate nucleic acid molecules which encode expression productshaving unique properties. Thus, the invention also provides methodsgenerating novel end products of biological pathways or processes. Inthis regard, methods of the invention are useful for generating anidentifying novel compounds, including therapeutic agents. Thus, in oneaspect, the invention further provides drug discovery methods andtherapeutic agents identified by these methods.

Examples of end products which can be produced by biological pathways orprocesses reconstituted and/or altered by methods of the inventioninclude chemotherapeutic agents (e.g., antibiotics, antivirals, taxol),carbohydrates, nucleotides, amino acids, lipids, ribosomes, andmembrane-bound organelles, as well as novel forms of each. Thus, themethods of the invention can be used to prepare nucleic acids whichconfer upon cells the ability to produce a wide variety of naturalcompounds, as well as modified forms of these compounds. Examples ofsuch compounds include those which fall into the following broadclasses: anti-bacterial therapeutics, anti-viral therapeutics,anti-parasitic therapeutics, anti-fungal therapeutics, anti-malarialtherapeutics, amebicide therapeutics, and anti-neoplastic therapeutics.

Due to the rapid rate at which microorganisms are developing resistanceto antibiotics, there is a great need for the development of newantibiotics. Further, it has been postulated that microorganisms willdevelop resistance more slowly to novel antibiotics for which there isno naturally occurring equivalent. Thus, in one aspect, the inventionprovides methods for producing novel antibiotics, as well as antibioticsproduced by methods of the invention.

One example of an organism which can be produced using methods of theinvention is an organism which produces novel antibiotic agents. Stassiet al., Proc. Natl. Acad. Sci. USA 95:7305-7309 (1998) describe theproduction of novel ethyl-substituted erythromycin derivatives producedby genetically engineered cells of Saccharopolyspora erythracea. Thus,methods of the invention can be used to insert into the cell geneticelements which encode proteins that generate novel antibiotics. Theinvention further includes cells produced by these methods and methodsfor using such cells to produce antibiotics, as well as antibioticsproduced by the methods of the invention.

Nucleic acid molecules encoding products involved in biosyntheticpathways for numerous therapeutic agents are known in the art. Forexample, genes and enzymes involved in the biosynthesis of β-lactamantibiotics are described, for example, in Martin, Appl. Microbiol.Biotechnol. 50(1):1-15 (1998). Thus, in specific aspects, the inventionincludes methods for producing these antibiotics and altered forms ofthese antibiotics, as well as the antibiotics themselves.

The invention further provides methods for producing anti-bacterialtherapeutics, anti-viral therapeutics, anti-parasitic therapeutics,anti-fungal therapeutics, anti-malarial therapeutics, amebicidetherapeutics, and anti-neoplastic therapeutics and altered forms of suchagents, as well as the agents themselves. Examples of anti-bacterialtherapeutics include compounds such as penicillins, ampicillin,amoxicillin, cyclacillin, epicillin, methicillin, nafcillin, oxacillin,cloxacillin, dicloxacillin, flucloxacillin, carbenicillin, cephalexin,cepharadine, cefadoxil, cefaclor, cefoxitin, cefotaxime, ceftizoxime,cefinenoxine, ceftriaxone, moxalactam, imipenem, clavulanate, timentin,sulbactam, erythromycin, neomycin, gentamycin, streptomycin,metronidazole, chloramphenicol, clindamycin, lincomycin, quinolones,rifampin, sulfonamides, bacitracin, polymyxin B, vancomycin,doxycycline, methacycline, minocycline, tetracycline, amphotericin B,cycloserine, ciprofloxacin, norfloxacin, isoniazid, ethambutol, andnalidixic acid, as well as derivatives and altered forms of each ofthese compounds.

Examples of anti-viral therapeutics include acyclovir, idoxuridine,ribavirin, trifluridine, vidirabine, dideoxucytidine, dideoxyinosine,zidovudine and gancyclovir, as well as derivatives and altered forms ofeach of these compounds.

Examples of anti-parasitic therapeutics include bithionol,diethylcarbamazine citrate, mebendazole, metrifonate, niclosamine,niridazole, oxamniquine (and other quinine derivatives), piperazinecitrate, praziquantel, pyrantel pamoate and thiabendazole, as well asderivatives and altered forms of each of these compounds.

Examples of anti-fungal therapeutics include amphotericin B,clotrimazole, econazole nitrate, flucyto sine, griseofulvin,ketoconazole and miconazole, as well as derivatives and altered forms ofeach of these compounds. Anti-fungal compounds also include aculeacin Aand papulocandin B. (See, e.g., Komiyama et al., Biol. Pharm. Bull.(1998) 21(10):1013-1019).)

Examples of anti-malarial therapeutics include chloroquine HCl,primaquine phosphate, pyrimethamine, quinine sulfate, and quinacrineHCl, as well as derivatives and altered forms of each of thesecompounds.

Examples of amebicide therapeutics include dehydroemetinedihydrochloride, iodoquinol, and paramomycin sulfate, as well asderivatives and altered forms of each of these compounds.

Examples of anti-neoplastic therapeutics include aminoglutethimide,azathioprine, bleomycin sulfate, busulfan, carmustine, chlorambucil,cisplatin, cyclophosphamide, cyclosporine, cytarabidine, dacarbazine,dactinomycin, daunorubicin, doxorubicin, taxol, etoposide, fluorouracil,interferon-cc, lomustine, mercaptopurine, methotrexate, mitotane,procarbazine HCl, thioguanine, vinblastine sulfate and vincristinesulfate, as well as derivatives and altered forms of each of thesecompounds.

Additional anti-microbial agents include peptides. Examples ofanti-microbial peptides are disclosed in Hancock et al., U.S. Pat. No.6,040,435 and Hancock et al., Proc. Natl. Acad. Sci. USA 97:8856-8861(2000).

Nucleic acid molecules can also be prepared using the methods of theinvention which encode more than one subunit of a multi-protein complex.Examples of such multi-protein complexes include splicesomes, ribosomes,the human 26S proteasome, and yeast RNA polymerase III. (See, e.g.,Saito et al., Gene 203(2):241-250 (1997); Flores et al., Proc. Natl.Acad. Sci. USA 96(14):7815-7820 (1999).)

Methods of the invention can also be used for the partial synthesis ofnon-naturally occurring products, as well as variants of these products(e.g., novel variants). For example, microorganisms which expressenzymes which catalyze particular reactions can be supplied withprecursors which these organisms do not normally produce. In cases wherethese precursors act as substrates for enzymes expressed by themicroorganisms, novel compounds may be produced. “Feeding” processes ofthis type have been used in the past to produce novel antibiotics. Inone aspect, feeding of this type is used in combination withmicroorganisms which express enzymes encoded by combinatorial librariesdescribed above.

Methods of the invention can be used to either (1) introduce a newpathway into a cells or (2) alter an existing cellular pathway so that,for example, one or more additional catalytic steps (e.g., two, three,four, five, seven, ten, etc.) occur during product synthesis. Oneexample of such an application of methods of the invention involves themodification of a protein which is naturally produced by a cell. In thisexample, genes encoding one or more catalytic steps which alter theprotein (e.g., encode enzymes involved in post-translation modificationreactions) are introduced into the cell. For example, nucleic acidswhich encode enzymes involved in ADP-ribosylation, glycosylation,sialylation, acetylation, ubiquination, serine to D-alanine conversion,biotinylation, acylation, amidation, formylation, carboxylation, GPIanchor formation, hydroxylation, methylation, myristoylation, oxidation,proteolytic processing, phosphorylation, prenylation, racemization,selenoylation, sulfation, arginylation can be inserted into the cell.Post-translational modifications of proteins are discussed inPROTEINS-STRUCTURE AND MOLECULAR PROPERTIES, 2nd Ed., T. E. Creighton,W. H. Freeman and Company, New York, 1993; Wold, F., POST-TRANSLATIONALPROTEIN MODIFICATIONS: PERSPECTIVES AND PROSPECTS, pgs. 1-12 inPost-translational Covalent Modification of Proteins, B. C. Johnson,Ed., Academic Press, New York, 1983; Seifter et al., “Analysis forProtein Modifications and Nonprotein Cofactors”, Meth. Enzymol. (1990)182:626-646 and Rattan et al., “Protein Synthesis: Post-translationalModifications and Aging”, Ann. NY Acad. Sci. (1992) 663:48-62.

Methods of the invention can be used, for example, to produce cellswhich contain nucleic acid molecules which encode proteins involved insignaling pathways. Further, these cells may be used to screen agentswhich modulate cell signaling. For example, cells may be produced usingmethods of the invention which express all of the components necessaryfor responding to tumor necrosis factors (TNFs). These cells can then beused to screen agents which either induce TNF mediated responses (TNFagonists) or block TNF mediated responses (TNF antagonists). Thus,included within the scope of the invention are methods for producingcells which can be used to screen for agonists and antagonists ofcellular ligands, as well as cells produced by such methods. Furtherincluded within the scope of the invention are methods for using cellsof the invention to identify agonists and antagonists of cellularligands and agonists and antagonists identified by methods of theinvention.

As noted above, methods of the invention can also be used to generatenucleic acids and cells which produce nutrients such as carbohydratesand amino acids. Carbohydrates and amino acids, as well as other carbonsources, can be used for a number of purposes. For example,carbohydrates and amino acids to prepare culture medium components forgrowing microorganisms, mammalian cells, and plant cells. Further, thesecompounds can be added to food products for both humans and liverstock.One specific example of a use of carbohydrates and amino acids is in thepreparation of nutritional formula for infants. (See, e.g., Highman etal., U.S. Pat. No. 6,120,814.) Thus, the invention further provides foodproducts (e.g., infant formula) made using carbon sources produced usingmethods of the invention.

Carbon sources which can be produced using cells prepared using methodsof the invention include carbohydrates (e.g., glucose, fructose,lactose, molasses, cellulose hydrolyzates, crude sugar hydrolyzates, andstarch hydrolyzates), organic acids (e.g., pyruvic acid, acetic acid,ftimaric acid, malic acid, and lactic acid), alcohols (glycerol,1,3,-propanediol, and ethanol), lipids, fatty acids, nucleotides,nucleosides, and amino acids. (See, e.g., Skraly et al., Appl. EnvironMicrobiol. 64:98-105 (1998).)

One example of an organism which can be produced using methods of theinvention is an organism which has acquired the ability to produceethanol. Deng et al., Appl. Environ. Microbiol. 65:523-528 (1999), forexample, describe Cyanobacteria which have been engineered to produceethanol. Thus, methods of the invention can be used to insert into cellgenetic elements which encode proteins involved in the production ofethanol. The invention further includes cells produced by these methodsand methods for using such cells to produce ethanol.

Another example of an organism which can be produced using methods ofthe invention is an organism which has acquired the ability to produceeither poly(3-hydroxyalkanoates) or increased amounts ofpoly(3-hydroxyalkanoates). Poly(3-hydroxyalkanoates) are compoundswhich, on extraction from cells, have plastic like properties. (See,e.g., Madison et al., Microbiol. Molec. Biol. Rev. 63:21-53 (1999).)Thus, methods of the invention can be used to insert into cell geneticelements which encode proteins involved in the production ofpoly(3-hydroxyalkanoates). The invention further includes cells producedby these methods and methods for using such cells to producepoly(3-hydroxyalkanoates), poly(3-hydroxyalkanoates) derivatives, andcompounds formed from poly(3-hydroxyalkanoates).

Amino acids which can be produced using cells prepared using methods ofthe invention include phenylalanine, tryptophan, tyrosine, leucine,isoleucine, valine, glutamine, asparagine, arginine, lysine, histidine,aspartic acid, glutamic acid, alanine, proline, serine, threonine,methionine, cysteine, and glycine. Genes and enzymes involved in thebiosynthesis of amino acids and amino acid precursors in a considerablenumber of organism are known in the art. (See, e.g., G. N. Cohen, “TheCommon Pathway to Lysine, Methionine and Threonine,” pp. 147-171 inAmino Acids: Biosynthesis and Genetic Regulation, K. M. Herrmann and R.L. Somerville, eds., Addison-Welesley Publishing Co., Inc., Reading,Mass. (1983).)

In addition to altering cells to produce new compounds, methods of theinvention can also be used to engineer cells so that they eitheroverproduce or underproduce products of the cells normal metabolism. Forexample, Donnelly et al., U.S. Pat. No. 5,770,435 described a mutantstrain of E. coli which produce increased amounts of succinic acid.Methods of the invention can be used, for example, to construct nucleicacid molecules which encode enzymes in the succinic acid biosyntheticpathway. Further, the expression of one or more of these enzymes can beregulated at the transcriptional level. Thus, the introduction of thesenucleic acid molecules into the above described E. coli cells willeffectively result in an amplification of one or more genes in thesuccinic acid biosynthetic pathway. Further, one or more of these genescan be operably linked to an inducible promoter (e.g., the lad promoter)so that increased succinic acid occurs only in the presence of theinducing signal (e.g., IPTG).

Methods of the invention can also be used to generate nucleic acids andcells which produce components and precursors that can be used inmanufacturing processes. Examples of such components include plastics,plastic-like compounds (e.g., polyketides), soaps, fertilizers, papers,synthetic rubber, dyes, inks, etc. The invention further includescomponents and precursors produced by methods and cells of theinvention.

Similarly, nucleic acid molecules prepared by the methods of theinvention can also be used to down regulate expression of, for example,one or more endogenous genes. One example of this is when nucleic acidinserts prepared by methods of the invention are transcribed to produceantisense RNA. Again, nucleic acid molecules which encode antisense RNAsmay be operably linked to a regulatable promoter.

Thus, the invention further includes methods for producing cells whicheither overproduce or underproduce products of the cells normalmetabolism, as well as cells produced by these methods.

As noted above, nucleic acid molecules prepared by methods of theinvention can be used to alter the physical characteristics of anorganism so that the organism has particular characteristics. Forexample, a cell which lacks specific enzymes required to produce eitherrecombinant or native proteins having particular glycosylation patternscan be introduced into the cell using the vectors of the invention.Glycosylation patterns of proteins has been found to be, to some extent,cell-type and species specific. (See, e.g., Jarvis et al., Curr. Opin.Biotechnol. 9:528-533 (1998).) Thus, in one aspect, the inventionprovides methods for producing cells which exhibit altered glycosylationpathways, as well as cells produced by these methods and glycosylatedcompounds produced by these cells. This process is generally termed“glycosylation engineering.” Stanley, Glycobiology 2:99-107 (1992).

For example, bacterial cells which do not glycosylate proteins may bemodified using methods of the invention to produce enzymes whichglycosylate proteins. Examples of such enzymes includeN-acetylglucosaminlytransferases III and V, β1,4-galactosyltransfera-se,α2,6-sialyltransferase, α2,3-sialyltransferase, α1,3-fucosyltransferaseIII and VI, and α1,2-mannosyltransfer-ase.

In another aspect, the invention provides methods for producing cellswhich exhibit altered metabolic properties leading to increasedproduction of compounds synthesized by these cells, as well as cellsproduced by these methods and products produced by these cells. Oneexample of such methods result in the production of cells which produceincreased quantities of precursors for biological pathways. This processis referred to herein as metabolic channeling or funneling. For example,when one seeks to produce a cell which produces increased amounts ofserine, nucleic acid molecules which encode enzymes of pathways whichlead to the production of 3-phosphoglycerate can be inserted into thecell. Optionally, nucleic acid molecules which encode enzymes involvedin the conversion of 3-phosphoglycerate to serine can also be insertedinto the cell. Parameters useful for consideration when engineeringcells which contain increased intracellular concentrations of precursorpools an compounds include the rate limiting set in the particularpathway and pathway fluxes. (See, e.g., Kholodenko et al., Biotechnol.Bioeng. 59:239-247 (1997).)

Polyketides represent a large family of diverse compounds synthesizedfrom 2-carbon units through a series of condensations and subsequentmodifications. Polyketides are produced in many types of organisms,including fungi and numerous bacteria, in particular, the actinomycetes.There are a wide variety of polyketide structures and polyketidesencompasses numerous compounds with diverse activities. (See, e.g., PCTpublication Nos. WO 93/13663; WO 95/08548; WO 96/40968; 97/02358; and98/27203; U.S. Pat. Nos. 4,874,748; 5,063,155; 5,098,837; 5,149,639;5,672,491; and 5,712,146; and Fu et al., 1994, Biochemistry33:9321-9326; McDaniel et al., 1993, Science 262:1546-1550; and Rohr,1995, Angew. Chem. Int. Ed. Engl. 34:881-888, each of which isincorporated herein by reference.)

Polyketide synthases (PKSs) assemble structurally diverse naturalproducts using a common mechanistic strategy that relies on a cysteineresidue to anchor the polyketide during a series of decarboxylativecondensation reactions that build the final reaction product. PKSsgenerally catalyze the assembly of complex natural products from simpleprecursors such as propionyl-CoA and methylmalonyl-CoA in a biosyntheticprocess that closely parallels fatty acid biosynthesis. Examples ofpolyketides include callystatin A, ansatrienin A, actinorhodin,rapamycin, methymycin, and pikromycin.

In one aspect, the invention provides methods for preparing nucleic acidmolecules which encode one or more PKSs, as well as cells which containthese nucleic acid molecules and the resulting polyketide products. Theinvention further provides methods for generating novel PKSs usingcombinatorial libraries and products produced by these novel PKSs (e.g.,novel macrolide antibiotics), as well methods for producing these novelPKS products.

Methods of the invention can also be used to construct strains ofmicroorganisms which are useful for decreasing the toxicity of variousagents. Such agents include petroleum-based pollutants (e.g.,chlorinated and non-chlorinated aliphatic compounds (e.g., C₅-C₃₆),chlorinated and non-chlorinated aromatic compounds (e.g., C₉-C₂₂), crudeoil, refined oil, fuel oils (e.g., Nos. 2, 4 and 6 fuel oils), dieseloils, gasoline, hydraulic oils, kerosene, benzene, toluene, ethylbenzeneand xylenes, trimethylbenzenes, naphthalene, anthracene, acenaphthene,acenaphthylene, benzo(a)anthracene, benzo(a)pyrene,benzo(b)fluoranthene, benzo(g,h,i)perylene, benzo(k)fluoranthene,pyrene, methylene chloride, 1,1-dichloroethane, chloroform,1,2-dichloropropane, dibromochloromethane, 1,1,2-trichloroethane,2-chloroethylvinyl ether, tetrachloroethene (PCE), chlorobenzene,1,2-dichloroethane, 1,1,1-trichloroethane, bromodichloromethane,trans-1,3-dichloropropene, cis-1,3-dichloropropene, bromoform,chloromethane, bromomethane, vinyl chloride, chloroethane,1,1-dichloroethene, trans-1,2-dichloroethene, trichloroethene (TCE),dichlorobenzenes, cis-1,2-dichloroethene, dibromomethane,1,4-dichlorobutane, 1,2,3-trichloropropane, bromochloromethane,2,2-dichloropropane, 1,2-dibromoethane, 1,3-dichloropropane,bromobenzene, chlorotoluenes, trichlorobenzenes,trans-1,4-dichloro-2-but-ene and butylbenzenes).

One example of an organism which can be produced using methods of theinvention is an organism which degrades toluene. Panke et al., Appl.Environ. Microbiol. 64:748-751 (1998) describe strains of Pseudomonasputida which converts toluene, as well as several toluene derivatives,to benzoates. Thus, methods of the invention can be used to insert intocell genetic elements which encode proteins that convert toluene, aswell as derivatives thereof, to less toxic compounds. The inventionfurther includes cells produced by these methods and methods for usingsuch cells to convert toluene, as well as several toluene derivatives,to less toxic compounds.

Methods of the invention can also be used to prepare organism suitablefor the detoxifying non-petroleum agents such as heavy metal ions (e.g.,mercury, copper, cadmium, silver, gold, tellurite, selenite, anduranium). Methods by which mercury, for example, can be detoxifiedinclude reduction of mercury ions to generate metallic mercury andthrough volatilization. Genes involved in the detoxification bybacterial are described in Miller, “Bacterial Detoxification of Hg(II)and Organomercurials”, Essays Biochem. 34:17-30 (1999).

Another example of a heavy metal ion detoxification system has beenidentified in a strain of Rhodobacter sphaeroide (see O'Gara et al.,Appl. Environ. Microbiol. 63(12):4713-4720 (1997)). Tellurite-resistancein this strain appears to be conferred by two loci. The first geneticlocus contains four genes; two of these genes (i.e., trgA and trgB)confer increased tellurite-resistance when inserted into anotherbacterium. Disruption of another gene at this locus, cysK (cysteinesynthase), results in decreased tellurite resistance. The second geneticlocus contains the telA gene. Inactivation of telA results in asignificant decreased tellurite resistance compared to the wild-typestrain.

Microorganisms which are capable of detoxifying agents are described,for example, in Perriello, U.S. Pat. No. 6,110,372. Microorganismssuitable for bioremediation applications include those of thePseudomonadaceae family, the Actinomycetes family, the Micrococcaceaefamily, the Vibrionaceae family, the Rhizobiaceae family, theCytophagaceae family, and the Corynebacterium family. Specific examplesof organisms suitable for use after modification using the methods ofthe invention for bioremediation applications include Pseudomonasrubrisubalbicans, Pseudomonas aeruginosa, Variovorax paradoxus, Nocardiaasteroides, Deinococcus radiodurans, Nocardia restricta,Chryseobacterium indologenes, Comamonas acidovorans, Acidovoraxdelafieldii, Rhodococcus rhodochrous, Rhodococcus erytlropolis,Aureobacterium esteroaromaticum, Aureobacterium saperdae, Micrococcusvarians, Micrococcus kristinae, Aeromonas caviae, Stenotrophomonasmaltophilia, Sphingobacterium thalpophilum, Clavibacter michiganense,Alcaligenes xylosoxydans, Corynebacterium aquaticum B and Cytophagajohnsonae.

Organisms suitable for bioremediation further include plants. Meagher etal., U.S. Pat. No. 5,965,796, for example, describes transgenic plantswhich express a metal ion resistance protein and reduce metal ions suchas those of copper, mercury, gold, cadmium, lead and silver. Further,genes encoding phytochelatins can be introduced into plants to increasephytochelatin synthesis. Phytochelatins are glutathione derivativeswhich detoxify metal ions through sequestration. Genes from a number ofplant species involved in phytochelatin synthesis are discussed inCorbett, “Phytochelatin Biosynthesis and Function in Heavy-MetalDetoxification”, Curr. Opin. Plant Biol. 3(3):211-216 (2000).

Specific plants suitable for bioremediation applications aftermodification by methods of the invention include Lepidium sativum,Brassica juncea, Brassica oleracea, Brassica rapa, Acena sativa,Triticum aestivum, Helianthus annuus, Colonial bentgrass, Kentuckybluegrass, perennial ryegrass, creeping bentgrass, Bermudagrass,Buffalograss, centipedegrass, switch grass, Japanese lawngrass, coastalpanicgrass, spinach, sorghum, tobacco and corn. Methods for generatingtransgenic plants are known in the art and, as noted above, aredescribed, for example, in Meagher et al., U.S. Pat. No. 5,965,796.

Methods of the invention can also be used to prepare organisms whichhave diverse characteristics and contain a considerable number ofinserted genes. As noted above, methods of the invention can be used toinsert an almost unlimited number of nucleic acid segments into cells.For example, in one specific embodiment, the invention provides methodsfor producing cells which express pesticidal proteins (e.g., pesticidalproteins of Bacillus thurginiensis). (See, e.g., Schnepf et al.,Microbiol. Molec. Biol. Rev. 62:775-806 (1998).) Thus, methods of theinvention can be used to insert into cell genetic elements which encodepesticidal proteins. The invention further includes cells produced bythese methods and methods for using such cells to produce pesticidalproteins. The invention further includes methods for using cells (e.g.,bacterial or plant cells) and pesticidal proteins produced by methods ofthe invention to control insect populations. In certain embodiments,cells produced by methods of the invention and used in methods of theinvention will be plant cells.

Thus, in one aspect, methods of the invention may be used to preparenucleic acid molecules which contain one or more ORFs and/or nucleicacid segments which encode one or more non-protein expression products(e.g., functional RNAs such as tRNAs or ribozymes). In most embodimentsof the invention, the number of ORFs and/or nucleic acid segments whichencode one or more non-protein expression products will generally rangebetween about 1 and about 300 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15,20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 225, 250, 275, 300,etc.). Nucleic acid molecules which contain one or more ORFs and/ornucleic acid segments which encode one or more non-protein expressionproducts will be especially useful for altering organisms to havespecified characteristics such as those described above.

Depending on a number of factors, including the number of functionalsegments present, the size of nucleic acid molecules of the inventionwill vary considerably in size but, in general, will range between fromabout 0.5 kb to about 300 kb (e.g., about 0.5 kb, about 1 kb, about 2kb, about 3 kb, about 4 kb, about 5 kb, about 7 kb, about 10 kb, about12 kb, about 15 kb, about 20 kb, about 40 kb, about 60 kb, about 80 kb,about 100 kb, about 200 kb, about 300 kb, etc.).

In a specific embodiment, the invention further provides methods forintroducing nucleic acid molecules of the invention into animals (e.g.,humans) and animal cells (e.g., human cells), as part of a gene therapyprotocol. Gene therapy refers to therapy performed by the administrationto a subject of an expressed or expressible nucleic acid molecule. Inmany embodiment of the invention, nucleic acid molecules of theinvention will encoded one or more proteins which mediates at least onetherapeutic effect. Thus, the invention provide nucleic acid moleculesand methods for use in gene therapy.

Nucleic acid molecules of the invention can be used to prepare genetherapy vectors designed to replace genes which reside in the genome ofa cell, to delete such genes, or to insert a heterologous gene or groupsof genes. When nucleic acid molecules of the invention function todelete or replace a gene or genes, the gene or genes being deleted orreplaced may lead to the expression of either a “normal” phenotype or anaberrant phenotype. One example of an aberrant phenotype is the diseasecystic fibrosis. Further, the gene therapy vectors may be either stablymaintained (e.g., integrate into cellular nucleic acid by homologousrecombination) or non-stably maintained in cells.

Further, nucleic acid molecules of the invention may be used to suppress“abnormal” phenotypes or complement or supplement “normal” phenotypeswhich result from the expression of endogenous genes. One example of anucleic acid molecule of the invention designed to suppress an abnormalphenotype would be where an expression product of the nucleic acidmolecule has dominant/negative activity. An example of a nucleic acidmolecule of the invention designed to supplement a normal phenotypewould be where introduction of the nucleic acid molecule effectivelyresults in the amplification of a gene resident in the cell.

Further, nucleic acid molecules of the invention may be used to insertinto cells nucleic acid segments which encode expression productsinvolved in each step of particular biological pathways (e.g.,biosynthesis of amino acids such as lysine, threonine, etc.) orexpression products involved in one or a few steps of such pathways.These nucleic acid molecules can be designed to, in effect, amplifygenes encoding expression products in such pathways, insert genes intocells which encode expression products involved in pathways not normallyfound in the cells, or to replace one or more genes involved one or moresteps of particular biological pathways in cells. Thus, gene therapyvectors of the invention may contain nucleic acid which results in theproduction one or more products (e.g., one, two, three, four, five,eight, ten, fifteen, etc.). Such vectors, especially those which lead tothe production of more than one product, will be particularly useful forthe treatment of diseases and/or conditions which result from theexpression and/or lack of expression of more than one gene or for thetreatment of more than one diseases and/or conditions.

Thus, in related aspects, the invention provides gene therapy vectorswhich express one or more expression products (e.g., one or more fusionproteins), methods for producing such vectors, methods for performinggene therapy using vectors of the invention, expression products of suchvector (e.g., encoded RNA and/or proteins), and host cells which containvectors of the invention.

For general reviews of the methods of gene therapy, see Goldspiel etal., 1993, Clinical Pharmacy 12:488-505; Wu and Wu, 1991, Biotherapy3:87-95; Tolstoshev, 1993, Ann. Rev. Pharmacol. Toxicol. 32:573-596;Mulligan, 1993, Science 260:926-932; and Morgan and Anderson, 1993, Ann.Rev. Biochem. 62:191-217; May, 1993, TIBTECH 11(5):155-215). Methodscommonly known in the art of recombinant DNA technology which can beused are described in Ausubel et al. (eds.), 1993, Current Protocols inMolecular Biology, John Wiley & Sons, NY; and Kriegler, 1990, GeneTransfer and Expression, A Laboratory Manual, Stockton Press, NY.

Delivery of the nucleic acid molecules of the invention into a patientmay be either direct, in which case the patient is directly exposed tothe nucleic acid or nucleic acid carrying vectors, or indirect, in whichcase, cells are first transformed with the nucleic acid in vitro, thentransplanted into the patient. These two approaches are known,respectively, as in vivo or ex vivo gene therapy.

In a specific embodiment, nucleic acid molecules of the invention aredirectly administered in vivo, where they are expressed to produce oneor more expression products. This can be accomplished by any of numerousmethods known in the art, such as by constructing an expression vectorand administering it so that they become intracellular (e.g., byinfection using defective or attenuated retroviral vectors or otherviral vectors (see U.S. Pat. No. 4,980,286), by direct injection ofnaked DNA, by use of microparticle bombardment (e.g., a gene gun;Biolistic, Dupont), by coating with lipids or cell-surface receptors ortransfecting agents, encapsulation in liposomes, microparticles, ormicrocapsules, or by administering them in linkage to a peptide which isknown to enter the nucleus, by administering it in linkage to a ligandsubject to receptor-mediated endocytosis (see, e.g., Wu and Wu, 1987, J.Biol. Chem. 262:4429-4432) (which can be used to target cell typesspecifically expressing the receptors), etc.). In another embodiment,nucleic acid molecules of the invention can be targeted in vivo for cellspecific uptake and expression, by targeting a specific receptor (see,e.g., PCT Publications WO 92/06180 dated Apr. 16, 1992 (Wu et al.); WO92/22635 dated Dec. 23, 1992 (Wilson et al.); WO92/20316 dated Nov. 26,1992 (Findeis et al.); WO93/14188 dated Jul. 22, 1993 (Clarke et al.),WO 93/20221 dated Oct. 14, 1993 (Young)). Alternatively, nucleic acidmolecules of the invention can be introduced intracellularly andincorporated within host cell DNA for expression, by homologousrecombination (Koller and Smithies, 1989, Proc. Natl. Acad. Sci. USA86:8932-8935; Zijlstra et al., 1989, Nature 342:435-438). Example ofsuch nucleic acid construct suitable for such an application are shownin FIGS. 21C and 22B.

In another specific embodiment, viral vectors that contains nucleic acidsequences encoding an antibody or other antigen-binding protein of theinvention are used. For example, a retroviral vector can be used (seeMiller et al., 1993, Meth. Enzymol. 217:581-599). These retroviralvectors have been used to delete retroviral sequences that are notnecessary for packaging of the viral genome and integration into hostcell DNA. The nucleic acid sequences encoding the antibody to be used ingene therapy are cloned into one or more vectors, which facilitatesdelivery of the gene into a patient. More detail about retroviralvectors can be found in Boesen et al., 1994, Biotherapy 6:291-302, whichdescribes the use of a retroviral vector to deliver the mdr1 gene tohematopoietic stem cells in order to make the stem cells more resistantto chemotherapy. Other references illustrating the use of retroviralvectors in gene therapy are: Clowes et al., 1994, J. Clin. Invest.93:644-651; Kiem et al., 1994, Blood 83:1467-1473; Salmons and Gunzberg,1993, Human Gene Therapy 4:129-141; and Grossman and Wilson, 1993, Curr.Opin. in Genetics and Devel. 3:110-114.

Adenoviruses are other viral vectors that can be used in gene therapy.Adenoviruses are especially attractive vehicles for delivering genes torespiratory epithelia and the use of such vectors are included withinthe scope of the invention. Adenoviruses naturally infect respiratoryepithelia where they cause a mild disease. Other targets foradenovirus-based delivery systems are liver, the central nervous system,endothelial cells, and muscle. Adenoviruses have the advantage of beingcapable of infecting non-dividing cells. Kozarsky and Wilson, 1993,Current Opinion in Genetics and Development 3:499-503 present a reviewof adenovirus-based gene therapy. Bout et al., 1994, Human Gene Therapy5:3-10 demonstrated the use of adenovirus vectors to transfer genes tothe respiratory epithelia of rhesus monkeys. Other instances of the useof adenoviruses in gene therapy can be found in Rosenfeld et al., 1991,Science 252:431-434; Rosenfeld et al., 1992, Cell 68:143-155;Mastrangeli et al., 1993, J. Clin. Invest. 91:225-234; PCT PublicationNos. WO94/12649 and WO 96/17053; U.S. Pat. No. 5,998,205; and Wang etal., 1995, Gene Therapy 2:775-783, the disclosures of all of which areincorporated herein by reference in their entireties. In a oneembodiment, adenovirus vectors are used.

Adeno-associated virus (AAV) and Herpes viruses, as well as vectorsprepared from these viruses have also been proposed for use in genetherapy (Walsh et al., 1993, Proc. Soc. Exp. Biol. Med. 204:289-300;U.S. Pat. No. 5,436,146; Wagstaff et al., Gene Ther. 5:1566-70 (1998)).Herpes viral vectors are particularly useful for applications where geneexpression is desired in nerve cells.

Another approach to gene therapy involves transferring a gene to cellsin tissue culture by such methods as electroporation, lipofection,calcium phosphate mediated transfection, or viral infection. Usually,the method of transfer includes the transfer of a selectable marker tothe cells. The cells are then placed under selection to isolate thosecells that have taken up and are expressing the transferred gene. Thosecells are then delivered to a patient.

In this embodiment, the nucleic acid is introduced into a cell prior toadministration in vivo of the resulting recombinant cell. Suchintroduction can be carried out by any method known in the art,including but not limited to transfection, electroporation,microinjection, infection with a viral or bacteriophage vectorcontaining the nucleic acid sequences, cell fusion, chromosome-mediatedgene transfer, microcell-mediated gene transfer, spheroplast fusion,etc. Numerous techniques are known in the art for the introduction offoreign genes into cells (see, e.g., Loeffler and Behr, 1993, Meth.Enzymol. 217:599-618; Cohen et al., 1993, Meth. Enzymol. 217:618-644;Cline, 1985, Pharmac. Ther. 29:69-92) and may be used in accordance withthe present invention, provided that the necessary developmental andphysiological functions of the recipient cells are not disrupted. Thetechnique should provide for the stable transfer of the nucleic acid tothe cell, so that the nucleic acid is expressible by the cell and,optionally, heritable and expressible by its cell progeny.

The resulting recombinant cells can be delivered to a patient by variousmethods known in the art. Recombinant blood cells (e.g., hematopoieticstem or progenitor cells) will generally be administered intravenously.The amount of cells envisioned for use depends on the desired effect,patient state, etc., and can be determined by one skilled in the art.

Cells into which a nucleic acid can be introduced for purposes of genetherapy encompass any desired, available cell type, and include but arenot limited to epithelial cells, endothelial cells, keratinocytes,fibroblasts, muscle cells, hepatocytes; blood cells such asT-lymphocytes, B-lymphocytes, monocytes, macrophages, neutrophils,eosinophils, megakaryocytes, granulocytes; various stem or progenitorcells, in particular hematopoietic stem or progenitor cells (e.g., asobtained from bone marrow, umbilical cord blood, peripheral blood, fetalliver, etc.).

In a certain embodiment, the cell used for gene therapy is autologous tothe patient.

In an embodiment in which recombinant cells are used in gene therapy,nucleic acid sequences encoding an antibody or other antigen-bindingprotein are introduced into the cells such that they are expressible bythe cells or their progeny, and the recombinant cells are thenadministered in vivo for therapeutic effect. In a specific embodiment,stem or progenitor cells are used. Any stem and/or progenitor cellswhich can be isolated and maintained in vitro can potentially be used inaccordance with this embodiment of the present invention (see, e.g., PCTPublication WO 94/08598, dated Apr. 28, 1994; Stemple and Anderson,1992, Cell 71:973-985; Rheinwald, 1980, Meth. Cell Bio. 21A:229; andPittelkow and Scott, 1986, Mayo Clinic Proc. 61:771).

In a specific embodiment, nucleic acid molecules to be introduced forpurposes of gene therapy comprises an inducible promoter operably linkedto the coding region, such that expression of the nucleic acid moleculesare controllable by controlling the presence or absence of theappropriate inducer of transcription.

The nucleic acid molecules of the invention can also be used to producetransgenic organisms (e.g., animals and plants). Animals of any species,including, but not limited to, mice, rats, rabbits, hamsters, guineapigs, pigs, micro-pigs, goats, sheep, cows and non-human primates (e.g.,baboons, monkeys, and chimpanzees) may be used to generate transgenicanimals. Further, plants of any species, including but not limited toLepidium sativum, Brassica juncea, Brassica oleracea, Brassica rapa,Acena sativa, Triticum aestivum, Helianthus annuus, Colonial bentgrass,Kentucky bluegrass, perennial ryegrass, creeping bentgrass,Bermudagrass, Buffalo grass, centipedegrass, switch grass, Japaneselawngrass, coastal panicgrass, spinach, sorghum, tobacco and corn, maybe used to generate transgenic plants.

Any technique known in the art may be used to introduce nucleic acidmolecules of the invention into organisms to produce the founder linesof transgenic organisms. Such techniques include, but are not limitedto, pronuclear microinjection (Paterson et al., Appl. Microbiol.Biotechnol. 40:691-698 (1994); Carver et al., Biotechnology (NY)11:1263-1270 (1993); Wright et al., Biotechnology (NY) 9:830-834 (1991);and Hoppe et al., U.S. Pat. No. 4,873,191 (1989)); retrovirus mediatedgene transfer into germ lines (Van der Putten et al., Proc. Natl. Acad.Sci., USA 82:6148-6152 (1985)), blastocysts or embryos; gene targetingin embryonic stem cells (Thompson et al., Cell 56:313-321 (1989));electroporation of cells or embryos (Lo, Mol. Cell. Biol. 3:1803-1814(1983)); introduction of the polynucleotides of the invention using agene gun (see, e.g., Ulmer et al., Science 259:1745 (1993); introducingnucleic acid constructs into embryonic pluripotent stem cells andtransferring the stem cells back into the blastocyst; and sperm-mediatedgene transfer (Lavitrano et al., Cell 57:717-723 (1989); etc. For areview of such techniques, see Gordon, “Transgenic Animals,” Intl. Rev.Cytol. 115:171-229 (1989), which is incorporated by reference herein inits entirety. Further, the contents of each of the documents recited inthis paragraph is herein incorporated by reference in its entirety. Seealso, U.S. Pat. No. 5,464,764 (Capecchi et al., Positive-NegativeSelection Methods and Vectors); U.S. Pat. No. 5,631,153 (Capecchi etal., Cells and Non-Human Organisms Containing Predetermined GenomicModifications and Positive-Negative Selection Methods and Vectors forMaking Same); U.S. Pat. No. 4,736,866 (Leder et al., TransgenicNon-Human Animals); and U.S. Pat. No. 4,873,191 (Wagner et al., GeneticTransformation of Zygotes); each of which is hereby incorporated byreference in its entirety.

Any technique known in the art may be used to produce transgenic clonescontaining nucleic acid molecules of the invention, for example, nucleartransfer into enucleated oocytes of nuclei from cultured embryonic,fetal, or adult cells induced to quiescence (Campell et al., Nature380:64-66 (1996); Wilmut et al., Nature 385:810-813 (1997)), each ofwhich is herein incorporated by reference in its entirety).

The present invention provides for transgenic organisms that carrynucleic acid molecules of the invention in all their cells, as well asorganisms which carry these nucleic acid molecules, but not all theircells, i.e., mosaic organisms or chimeric. The nucleic acid molecules ofthe invention may be integrated as a single copy or as multiple copiessuch as in concatamers, e.g., head-to-head tandems or head-to-tailtandems. The nucleic acid molecules of the invention may also beselectively introduced into and activated in a particular cell type byfollowing, for example, the teaching of Lasko et al. (Lasko et al.,Proc. Natl. Acad. Sci. USA 89:6232-6236 (1992)). The regulatorysequences required for such a cell-type specific activation will dependupon the particular cell type of interest, and will be apparent to thoseof skill in the art. When it is desired that nucleic acid molecules ofthe invention be integrated into the chromosomal site of the endogenousgene, this will normally be done by gene targeting. Briefly, when such atechnique is to be utilized, vectors containing some nucleotidesequences homologous to the endogenous gene are designed for the purposeof integrating, via homologous recombination with chromosomal sequences,into and disrupting the function of the nucleotide sequence of theendogenous gene. Nucleic acid molecules of the invention may also beselectively introduced into a particular cell type, thus inactivatingthe endogenous gene in only that cell type, by following, for example,the teaching of Gu et al. (Gu et al., Science 265:103-106 (1994)). Theregulatory sequences required for such a cell-type specific inactivationwill depend upon the particular cell type of interest, and will beapparent to those of skill in the art. The contents of each of thedocuments recited in this paragraph is herein incorporated by referencein its entirety.

Once transgenic organisms have been generated, the expression of therecombinant gene may be assayed utilizing standard techniques. Initialscreening may be accomplished by Southern blot analysis or PCRtechniques to analyze organism tissues to verify that integration ofnucleic acid molecules of the invention has taken place. The level ofmRNA expression of nucleic acid molecules of the invention in thetissues of the transgenic organisms may also be assessed usingtechniques which include, but are not limited to, Northern blot analysisof tissue samples obtained from the organism, in situ hybridizationanalysis, and reverse transcriptase-PCR (rt-PCR). Samples of tissue maywhich express nucleic acid molecules of the invention also be evaluatedimmunocytochemically or immunohistochemically using antibodies specificfor the expression product of these nucleic acid molecules.

Once the founder organisms are produced, they may be bred, inbred,outbred, or crossbred to produce colonies of the particular organism.Examples of such breeding strategies include, but are not limited to:outbreeding of founder organisms with more than one integration site inorder to establish separate lines; inbreeding of separate lines in orderto produce compound transgenic organisms that express nucleic acidmolecules of the invention at higher levels because of the effects ofadditive expression of each copy of nucleic acid molecules of theinvention; crossing of heterozygous transgenic organisms to produceorganisms homozygous for a given integration site in order to bothaugment expression and eliminate the need for screening of organisms byDNA analysis; crossing of separate homozygous lines to produce compoundheterozygous or homozygous lines; and breeding to place the nucleic acidmolecules of the invention on a distinct background that is appropriatefor an experimental model of interest.

Transgenic and “knock-out” organisms of the invention have uses whichinclude, but are not limited to, model systems (e.g., animal modelsystems) useful in elaborating the biological function of expressionproducts of nucleic acid molecules of the invention, studying conditionsand/or disorders associated with aberrant expression of expressionproducts of nucleic acid molecules of the invention, and in screeningfor compounds effective in ameliorating such conditions and/ordisorders.

As one skilled in the art would recognize, in many instances whennucleic acid molecules of the invention are introduced into metazoanorganisms, it will be desirable to operably link sequences which encodeexpression products to tissue-specific transcriptional regulatorysequences (e.g., tissue-specific promoters) where production of theexpression product is desired. Such promoters can be used to facilitateproduction of these expression products in desired tissues. Aconsiderable number of tissue-specific promoters are known in the art.Further, methods for identifying tissue-specific transcriptionalregulatory sequences are described elsewhere herein.

Host Cells

The invention also relates to host cells comprising one or more of thenucleic acid molecules or vectors of the invention (e.g., two, three,four, five, seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.),particularly those nucleic acid molecules and vectors described indetail herein. Representative host cells that may be used according tothis aspect of the invention include, but are not limited to, bacterialcells, yeast cells, plant cells and animal cells. Preferred bacterialhost cells include Escherichia spp. cells (particularly E. coli cellsand most particularly E. coli strains DH10B, Stb12, DH5, DB3 (depositNo. NRRL B-30098), DB3.1 (preferably E. coli LIBRARY EFFICIENCY® DB3.1™Competent Cells; Invitrogen Corporation, Carlsbad, Calif.), DB4 and DB5(deposit Nos. NRRL B-30106 and NNRL B-30107 respectively, see U.S.application Ser. No. 09/518,188, filed Mar. 2, 2000, the disclosure ofwhich is incorporated by reference herein in its entirety), JDP682 andccdA-over (See U.S. Provisional Application No. 60/475,004, filed Jun.3, 2003, the disclosure of which is incorporated by reference herein inits entirety), Bacillus spp. cells (particularly B. subtilis and B.megaterium cells), Streptomyces spp. cells, Erwinia spp. cells,Klebsiella spp. cells, Serratia spp. cells (particularly S. marcessanscells), Pseudomonas spp. cells (particularly P. aeruginosa cells), andSalmonella spp. cells (particularly S. typhimurium and S. typhi cells).Preferred animal host cells include insect cells (most particularlyDrosophila melanogaster cells, Spodoptera frugiperda Sf9 and Sf21 cellsand Trichoplusa High-Five cells), nematode cells (particularly C.elegans cells), avian cells, amphibian cells (particularly Xenopuslaevis cells), reptilian cells, and mammalian cells (most particularlyNIH3T3, CHO, COS, VERO, BHK and human cells). Preferred yeast host cellsinclude Saccharomyces cerevisiae cells and Pichia pastoris cells. Theseand other suitable host cells are available commercially, for example,from Invitrogen Corp. (Carlsbad, Calif.), American Type CultureCollection (Manassas, Va.), and Agricultural Research Culture Collection(NRRL; Peoria, Ill.).

Methods for introducing the nucleic acid molecules and/or vectors of theinvention into the host cells described herein, to produce host cellscomprising one or more of the nucleic acid molecules and/or vectors ofthe invention, will be familiar to those of ordinary skill in the art.For instance, the nucleic acid molecules and/or vectors of the inventionmay be introduced into host cells using well known techniques ofinfection, transduction, electroporation, transfection, andtransformation. The nucleic acid molecules and/or vectors of theinvention may be introduced alone or in conjunction with other thenucleic acid molecules and/or vectors and/or proteins, peptides or RNAs.Alternatively, the nucleic acid molecules and/or vectors of theinvention may be introduced into host cells as a precipitate, such as acalcium phosphate precipitate, or in a complex with a lipid.Electroporation also may be used to introduce the nucleic acid moleculesand/or vectors of the invention into a host. Likewise, such moleculesmay be introduced into chemically competent cells such as E. coli. Ifthe vector is a virus, it may be packaged in vitro or introduced into apackaging cell and the packaged virus may be transduced into cells. Thusnucleic acid molecules of the invention may contain and/or encode one ormore packaging signal (e.g., viral packaging signals which direct thepackaging of viral nucleic acid molecules). Hence, a wide variety oftechniques suitable for introducing the nucleic acid molecules and/orvectors of the invention into cells in accordance with this aspect ofthe invention are well known and routine to those of skill in the art.Such techniques are reviewed at length, for example, in Sambrook, J., etal., Molecular Cloning, a Laboratory Manual, 2nd Ed., Cold SpringHarbor, N.Y.: Cold Spring Harbor Laboratory Press, pp. 16.30-16.55(1989), Watson, J. D., et al., Recombinant DNA, 2nd Ed., New York: W.H.Freeman and Co., pp. 213-234 (1992), and Winnacker, E.-L., From Genes toClones, New York: VCH Publishers (1987), which are illustrative of themany laboratory manuals that detail these techniques and which areincorporated by reference herein in their entireties for their relevantdisclosures.

Polymerases

Polymerases for use in the invention include but are not limited topolymerases (DNA and RNA polymerases), and reverse transcriptases. DNApolymerases include, but are not limited to, Thermus thermophilus (Tth)DNA polymerase, Thermus aquaticus (Taq) DNA polymerase, Thermotoganeopolitana (Tne) DNA polymerase, Thermotoga maritima (Tma) DNApolymerase, Thermococcus litoralis (Tli or VENT™) DNA polymerase,Pyrococcus furiosus (Pfu) DNA polymerase, DEEPVENT™ DNA polymerase,Pyrococcus woosii (Pwo) DNA polymerase, Pyrococcus sp KOD2 (KOD) DNApolymerase, Bacillus sterothermophilus (Bst) DNA polymerase, Bacilluscaldophilus (Bca) DNA polymerase, Sulfolobus acidocaldarius (Sac) DNApolymerase, Thermoplasma acidophilum (Tac) DNA polymerase, Thermusflavus (Tfl/Tub) DNA polymerase, Thermus ruber (Tru) DNA polymerase,Thermus brockianus (DYNAZYME™) DNA polymerase, Methanobacteriumthermoautotrophicum (Mth) DNA polymerase, mycobacterium DNA polymerase(Mtb, Mlep), E. coli pol I DNA polymerase, T5 DNA polymerase, T7 DNApolymerase, and generally pol I type DNA polymerases and mutants,variants and derivatives thereof. RNA polymerases such as T3, T5, T7 andSP6 and mutants, variants and derivatives thereof may also be used inaccordance with the invention.

The nucleic acid polymerases used in the present invention may bemesophilic or thermophilic, and are preferably thermophilic. Preferredmesophilic DNA polymerases include Pol I family of DNA polymerases (andtheir respective Klenow fragments) any of which may be isolated fromorganism such as E. coli, H. influenzae, D. radiodurans, H. pylori, C.aurantiacus, R. prowazekii, T pallidum, Synechocystis sp., B. subtilis,L. lactis, S. pneumoniae, M tuberculosis, M leprae, M. smegmatis,Bacteriophage L5, phi-C31, T7, T3, T5, SP01, SPO₂, mitochondrial from S.cerevisiae MIP-1, and eukaryotic C. elegans, and D. melanogaster(Astatke, M. et al., 1998, J. Mol. Biol. 278, 147-165), pol III type DNApolymerase isolated from any sources, and mutants, derivatives orvariants thereof, and the like. Preferred thermostable DNA polymerasesthat may be used in the methods and compositions of the inventioninclude Taq, Tne, Tma, Pfu, KOD, Tfl, Tth, Stoffel fragment, VENT™ andDEEPVENT™ DNA polymerases, and mutants, variants and derivatives thereof(U.S. Pat. No. 5,436,149; U.S. Pat. No. 4,889,818; U.S. Pat. No.4,965,188; U.S. Pat. No. 5,079,352; U.S. Pat. No. 5,614,365; U.S. Pat.No. 5,374,553; U.S. Pat. No. 5,270,179; U.S. Pat. No. 5,047,342; U.S.Pat. No. 5,512,462; WO 92/06188; WO 92/06200; WO 96/10640; WO 97/09451;Barnes, W. M., Gene 112:29-35 (1992); Lawyer, F. C., et al., PCR Meth.Appl. 2:275-287 (1993); Flaman, J.-M, et al., Nucl. Acids Res.22(15):3259-3260 (1994)).

Reverse transcriptases for use in this invention include any enzymehaving reverse transcriptase activity. Such enzymes include, but are notlimited to, retroviral reverse transcriptase, retrotransposon reversetranscriptase, hepatitis B reverse transcriptase, cauliflower mosaicvirus reverse transcriptase, bacterial reverse transcriptase, Tth DNApolymerase, Taq DNA polymerase (Saiki, R. K., et al., Science239:487-491 (1988); U.S. Pat. Nos. 4,889,818 and 4,965,188), Tne DNApolymerase (WO 96/10640 and WO 97/09451), Tma DNA polymerase (U.S. Pat.No. 5,374,553) and mutants, variants or derivatives thereof (see, e.g.,WO 97/09451 and WO 98/47912). Preferred enzymes for use in the inventioninclude those that have reduced, substantially reduced or eliminatedRNase H activity. By an enzyme “substantially reduced in RNase Hactivity” is meant that the enzyme has less than about 20%, morepreferably less than about 15%, 10% or 5%, and most preferably less thanabout 2%, of the RNase H activity of the corresponding wild-type orRNase Ft enzyme such as wild-type Moloney Murine Leukemia Virus (M-MLV),Avian Myeloblastosis Virus (AMV) or Rous Sarcoma Virus (RSV) reversetranscriptases. The RNase H activity of any enzyme may be determined bya variety of assays, such as those described, for example, in U.S. Pat.No. 5,244,797, in Kotewicz, M. L., et al., Nucl. Acids Res. 16:265(1988) and in Gerard, G. F., et al., FOCUS 14(5):91 (1992), thedisclosures of all of which are filly incorporated herein by reference.Particularly preferred polypeptides for use in the invention include,but are not limited to, M-MLV H⁻ reverse transcriptase, RSV H⁻ reversetranscriptase, AMV H⁻ reverse transcriptase, RAV (rous-associated virus)H⁻ reverse transcriptase, MAV (myeloblastosis-associated virus) H⁻reverse transcriptase and HIV H⁻ reverse transcriptase. (See U.S. Pat.No. 5,244,797 and WO 98/47912). It will be understood by one of ordinaryskill, however, that any enzyme capable of producing a DNA molecule froma ribonucleic acid molecule (i.e., having reverse transcriptaseactivity) may be equivalently used in the compositions, methods and kitsof the invention.

The enzymes having polymerase activity for use in the invention may beobtained commercially, for example from Invitrogen Corp. (Carlsbad,Calif.), Perkin-Elmer (Branchburg, N.J.), New England BioLabs (Beverly,Mass.) or Boehringer Mannheim Biochemicals (Indianapolis, Ind.). Enzymeshaving reverse transcriptase activity for use in the invention may beobtained commercially, for example, from Invitrogen Corp., (Carlsbad,Calif.), Pharmacia (Piscataway, N.J.), Sigma (Saint Louis, Mo.) orBoehringer Mannheim Biochemicals (Indianapolis, Ind.). Alternatively,polymerases or reverse transcriptases having polymerase activity may beisolated from their natural viral or bacterial sources according tostandard procedures for isolating and purifying natural proteins thatare well-known to one of ordinary skill in the art (see, e.g., Houts, G.E., et al., J. Virol. 29:517 (1979)). In addition, suchpolymerases/reverse transcriptases may be prepared by recombinant DNAtechniques that are familiar to one of ordinary skill in the art (see,e.g., Kotewicz, M. L., et al., Nucl. Acids Res. 16:265 (1988); U.S. Pat.No. 5,244,797; WO 98/47912; Soltis, D. A., and Skalka, A. M., Proc.Natl. Acad. Sci. USA 85:3372-3376 (1988)). Examples of enzymes havingpolymerase activity and reverse transcriptase activity may include anyof those described in the present application.

Supports and Arrays

Supports for use in accordance with the invention may be any support ormatrix suitable for attaching nucleic acid molecules comprising one ormore recombination sites or portions thereof. Such molecules may beadded or bound (covalently or non-covalently) to the supports of theinvention by any technique or any combination of techniques well knownin the art. Supports of the invention may comprise nitrocellulose,diazocellulose, glass, polystyrene (including microtitre plates),polyvinylchloride, polypropylene, polyethylene, polyvinylidenedifluoride(PVDF), dextran, Sepharose, agar, starch and nylon. Supports of theinvention may be in any form or configuration including beads, filters,membranes, sheets, frits, plugs, columns and the like. Solid supportsmay also include multi-well tubes (such as microtitre plates) such as12-well plates, 24-well plates, 48-well plates, 96-well plates, and384-well plates. Preferred beads are made of glass, latex or a magneticmaterial (magnetic, paramagnetic or superparamagnetic beads).

In a preferred aspect, methods of the invention may be used to preparearrays of proteins or nucleic acid molecules (RNA or DNA) or arrays ofother molecules, compounds, and/or substances. Such arrays may be formedon microplates, glass slides or standard blotting membranes and may bereferred to as microarrays or gene-chips depending on the format anddesign of the array. Uses for such arrays include gene discovery, geneexpression profiling, genotyping (SNP analysis, pharmacogenomics,toxicogenetics), and the preparation of nanotechnology devices.

Synthesis and use of nucleic acid arrays and generally attachment ofnucleic acids to supports have been described (see, e.g., U.S. Pat. No.5,436,327, U.S. Pat. No. 5,800,992, U.S. Pat. No. 5,445,934, U.S. Pat.No. 5,763,170, U.S. Pat. No. 5,599,695 and U.S. Pat. No. 5,837,832). Anautomated process for attaching various reagents to positionally definedsites on a substrate is provided in Pirrung, et al. U.S. Pat. No.5,143,854 and Barrett, et al. U.S. Pat. No. 5,252,743. For example,disulfide-modified oligonucleotides can be covalently attached to solidsupports using disulfide bonds. (See Rogers et al., Anal. Biochem.266:23-30 (1999).) Further, disulfide-modified oligonucleotides can bepeptide nucleic acid (PNA) using solid-phase synthesis. (SeeAldrian-Herrada et al., J. Pept. Sci. 4:266-281 (1998).) Thus, nucleicacid molecules comprising one or more recombination sites or portionsthereof can be added to one or more supports (or can be added in arrayson such supports) and nucleic acids, proteins or other molecules and/orcompounds can be added to such supports through recombination methods ofthe invention. Conjugation of nucleic acids to a molecule of interestare known in the art and thus one of ordinary skill can producemolecules and/or compounds comprising recombination sites (or portionsthereof) for attachment to supports (in array format or otherwise)according to the invention.

Essentially, any conceivable support may be employed in the invention.The support may be biological, non-biological, organic, inorganic, or acombination of any of these, existing as particles, strands,precipitates, gels, sheets, tubing, spheres, containers, capillaries,pads, slices, films, plates, slides, etc. The support may have anyconvenient shape, such as a disc, square, sphere, circle, etc. Thesupport is preferably flat but may take on a variety of alternativesurface configurations. For example, the support may contain raised ordepressed regions which may be used for synthesis or other reactions.The support and its surface preferably form a rigid support on which tocarry out the reactions described herein. The support and its surfaceare also chosen to provide appropriate light-absorbing characteristics.For instance, the support may be a polymerized Langmuir Blodgett film,functionalized glass, Si, Ge, GaAs, GaP, SiO2, SIN4, modified silicon,or any one of a wide variety of gels or polymers such as(poly)tetrafluoroethylene, (poly)vinylidenedifluoride, polystyrene,polycarbonate, or combinations thereof. Other support materials will bereadily apparent to those of skill in the art upon review of thisdisclosure. In a preferred embodiment the support is flat glass orsingle-crystal silicon.

Thus, the invention provides methods for preparing arrays of nucleicacid molecules attached to supports. In some embodiments, these nucleicacid molecules will have recombination sites at one or more (e.g., one,two, three or four) of their termini. In some additional embodiments,one nucleic acid molecule will be attached directly to the support, orto a specific section of the support, and one or more additional nucleicacid molecules will be indirectly attached to the support via attachmentto the nucleic acid molecule which is attached directly to the support.In such cases, the nucleic acid molecule which is attached directly tothe support provides a site of nucleation around which a nucleic acidarray may be constructed.

The invention further provides methods for linking supports to eachother and for linking molecules bound to the same support together.Using FIG. 11 for non-limiting illustration of one embodiment of such aprocess, a recombination site designated RS₆ can be positioned at theend of the RS₅ site on the A/B composition shown attached to the supportin the lower portion of the figure. Further, an identical compositionmay also be attached to another part of the same or different support.Recombination between the RS₆ sites can then be used to connect the twocompositions, thereby forming either a linkage between two compositionsattached to the same support or two compositions attached to thedifferent support. The invention thus provides methods for cross-linkingcompounds attached to the same support by linking one or morecompositions bound to the support using recombination sites. Theinvention also provides methods for cross-linking separate supports bylinking one or more compositions bound to these supports suingrecombination sites.

In one aspect, the invention provides supports containing nucleic acidmolecules which are produced by methods of the invention. In manyembodiments, the nucleic acid molecules of these supports will containat least one recombination site. In some embodiments, this recombinationsite will have undergone recombination prior to attachment of thenucleic acid molecule to the support. These bound nucleic acid moleculesare useful, for example, for identifying other nucleic acid molecules(e.g., nucleic acid molecules which hybridize to the bound nucleic acidmolecules under stringent hybridization conditions) and proteins whichhave binding affinity for the bound nucleic acid molecules. Expressionproducts may also be produced from these bound nucleic acid moleculeswhile the nucleic acid molecules remain bound to the support. Thus,compositions and methods of the invention can be used to identifyexpression products and products produced by these expression products.

In other embodiments, nucleic acid molecules bound to supports willundergo recombination after attachment of the nucleic acid molecule tothe support. As already discussed, these bound nucleic acid moleculesmay thus be used to identify nucleic acid molecules which encodeexpression products involved in one or a specified number of biologicalprocesses or pathways.

Further, nucleic acid molecules attached to supports may be releasedfrom these supports. Methods for releasing nucleic acid moleculesinclude restriction digestion, recombination, and, altering conditions(e.g., temperature, salt concentrations, etc.) to induce thedissociation of nucleic acid molecules which have hybridized to boundnucleic acid molecules. Thus, methods of the invention include the useof supports to which nucleic acid molecules have been bound for theisolation of nucleic acid molecules.

As noted above, in one aspect, the invention provides methods forscreening nucleic acid libraries to identifying nucleic acid moleculeswhich encode expression products involved in the same biologicalprocesses or pathways. In specific embodiments, such methods involve (1)attaching a nucleic acid molecule comprising at least one recombinationsite to a support, (2) contact the bound nucleic acid molecule with alibrary of nucleic acid molecules, wherein individual nucleic acidmolecules of the library comprise at least one recombination site, underconditions which facilitate recombination between the bound nucleic acidmolecule and nucleic acid molecules of the library, and (3) screeningfor either expression products of the nucleic acid molecule formed byrecombination or products produced by the expression products of thesenucleic acid molecules.

Examples of compositions which can be formed by binding nucleic acidmolecules to supports are “gene chips,” often referred to in the art as“DNA microarrays” or “genome chips” (see U.S. Pat. Nos. 5,412,087 and5,889,165, and PCT Publication Nos. WO 97/02357, WO 97/43450, WO98/20967, WO 99/05574, WO 99/05591, and WO 99/40105, the disclosures ofwhich are incorporated by reference herein in their entireties). Invarious embodiments of the invention, these gene chips may contain two-and three-dimensional nucleic acid arrays described herein.

The adressability of nucleic acid arrays of the invention means thatmolecules or compounds which bind to particular nucleotide sequences canbe attached to the arrays. Thus, components such as proteins and othernucleic acids can be attached to specific locations/positions in nucleicacid arrays of the invention.

Thus, in one aspect, the invention provides affinity purificationmethods comprising (1) providing a support to which nucleic acidmolecules comprising at least one recombination site are bound, (2)attaching one or more additional nucleic acid molecules to the supportusing recombination reactions, (3) contacting the support with acomposition containing molecules or compounds which have bindingaffinity for nucleic acid molecules bound to the support, underconditions which facilitate binding of the molecules or compounds to thenucleic acid molecules bound to the support, (4) altering the conditionsto facilitate the release of the bound molecules or compounds, and (5)collecting the released molecules or compounds.

Methods of Nucleic Acid Synthesis, Amplification and Sequencing

The present invention may be used in combination with any methodinvolving the synthesis of nucleic acid molecules, such as DNA(including cDNA) and RNA molecules. Such methods include, but are notlimited to, nucleic acid synthesis methods, nucleic acid amplificationmethods and nucleic acid sequencing methods. Such methods may be used toprepare molecules (e.g., starting molecules) used in the invention or tofurther manipulate molecules or vectors produced by the invention.

Nucleic acid synthesis methods according to this aspect of the inventionmay comprise one or more steps (e.g., two, three, four, five, seven,ten, twelve, fifteen, etc.). For example, the invention provides amethod for synthesizing a nucleic acid molecule comprising (a) mixing anucleic acid template (e.g., a nucleic acid molecules or vectors of theinvention) with one or more primers (e.g., two, three, four, five,seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.) and one ormore enzymes (e.g., two, three, four, five, seven, etc.) havingpolymerase or reverse transcriptase activity to form a mixture; and (b)incubating the mixture under conditions sufficient to make a firstnucleic acid molecule complementary to all or a portion of the template.According to this aspect of the invention, the nucleic acid template maybe a DNA molecule such as a cDNA molecule or library, or an RNA moleculesuch as a mRNA molecule. Conditions sufficient to allow synthesis suchas pH, temperature, ionic strength, and incubation times may beoptimized by those skilled in the art. If desired, recombination sitesmay be added to such synthesized molecules during or after the synthesisprocess (see, e.g., U.S. patent application Ser. No. 09/177,387 filedOct. 23, 1998 based on U.S. provisional patent application No.60/065,930 filed Oct. 24, 1997).

In accordance with the invention, the target or template nucleic acidmolecules or libraries may be prepared from nucleic acid moleculesobtained from natural sources, such as a variety of cells, tissues,organs or organisms. Cells that may be used as sources of nucleic acidmolecules may be prokaryotic (bacterial cells, including those ofspecies of the genera Escherichia, Bacillus, Serratia, Salmonella,Staphylococcus, Streptococcus, Clostridium, Chlamydia, Neisseria,Treponema, Mycoplasma, Borrelia, Legionella, Pseudomonas, Mycobacterium,Helicobacter, Erwinia, Agrobacterium, Rhizobium, and Streptomyces) oreukaryotic (including fungi (especially yeast's), plants, protozoans andother parasites, and animals including insects (particularly Drosophilaspp. cells), nematodes (particularly Caenorhabditis elegans cells), andmammals (particularly human cells)).

Of course, other techniques of nucleic acid synthesis which may beadvantageously used will be readily apparent to one of ordinary skill inthe art.

In other aspects of the invention, the invention may be used incombination with methods for amplifying or sequencing nucleic acidmolecules. Nucleic acid amplification methods according to this aspectof the invention may include the use of one or more polypeptides havingreverse transcriptase activity, in methods generally known in the art asone-step (e.g., one-step RT-PCR) or two-step (e.g., two-step RT-PCR)reverse transcriptase-amplification reactions. For amplification of longnucleic acid molecules (i.e., greater than about 3-5 Kb in length), acombination of DNA polymerases may be used, as described in WO 98/06736and WO 95/16028.

Amplification methods according to the invention may comprise one ormore steps (e.g., two, three, four, five, seven, ten, etc.). Forexample, the invention provides a method for amplifying a nucleic acidmolecule comprising (a) mixing one or more enzymes with polymeraseactivity (e.g., two, three, four, five, seven, ten, etc.) with one ormore nucleic acid templates (e.g., two, three, four, five, seven, ten,twelve, fifteen, twenty, thirty, fifty, one hundred, etc.); and (b)incubating the mixture under conditions sufficient to allow the enzymewith polymerase activity to amplify one or more nucleic acid moleculescomplementary to all or a portion of the templates. The invention alsoprovides nucleic acid molecules amplified by such methods. If desired,recombination sites may be added to such amplified molecules during orafter the amplification process (see, e.g., U.S. patent application Ser.No. 09/177,387 filed Oct. 23, 1998 based on U.S. provisional patentapplication No. 60/065,930 filed Oct. 24, 1997).

General methods for amplification and analysis of nucleic acid moleculesor fragments are well known to one of ordinary skill in the art (see,e.g., U.S. Pat. Nos. 4,683,195; 4,683,202; and 4,800,159; Innis, M. A.,et al., eds., PCR Protocols: A Guide to Methods and Applications, SanDiego, Calif.: Academic Press, Inc. (1990); Griffin, H. G., and Griffin,A. M., eds., PCR Technology: Current Innovations, Boca Raton, Fla.: CRCPress (1994)). For example, amplification methods which may be used inaccordance with the present invention include PCR (U.S. Pat. Nos.4,683,195 and 4,683,202), Strand Displacement Amplification (SDA; U.S.Pat. No. 5,455,166; EP 0 684 315), and Nucleic Acid Sequence-BasedAmplification (NASBA; U.S. Pat. No. 5,409,818; EP 0 329 822).

Typically, these amplification methods comprise: (a) mixing one or moreenzymes with polymerase activity with the nucleic acid sample in thepresence of one or more primers, and (b) amplifying the nucleic acidsample to generate a collection of amplified nucleic acid fragments,preferably by PCR or equivalent automated amplification technique.

Following amplification or synthesis by the methods of the presentinvention, the amplified or synthesized nucleic acid fragments may beisolated for further use or characterization. This step is usuallyaccomplished by separation of the amplified or synthesized nucleic acidfragments by size or by any physical or biochemical means including gelelectrophoresis, capillary electrophoresis, chromatography (includingsizing, affinity and immunochromatography), density gradientcentrifugation and immunoadsorption. Separation of nucleic acidfragments by gel electrophoresis is particularly preferred, as itprovides a rapid and highly reproducible means of sensitive separationof a multitude of nucleic acid fragments, and permits direct,simultaneous comparison of the fragments in several samples of nucleicacids. One can extend this approach, in another preferred embodiment, toisolate and characterize these fragments or any nucleic acid fragmentamplified or synthesized by the methods of the invention. Thus, theinvention is also directed to isolated nucleic acid molecules producedby the amplification or synthesis methods of the invention.

In this embodiment, one or more of the amplified or synthesized nucleicacid fragments are removed from the gel which was used foridentification (see above), according to standard techniques such aselectroelution or physical excision. The isolated unique nucleic acidfragments may then be inserted into standard vectors, includingexpression vectors, suitable for transfection or transformation of avariety of prokaryotic (bacterial) or eukaryotic (yeast, plant or animalincluding human and other mammalian) cells. Alternatively, nucleic acidmolecules produced by the methods of the invention may be furthercharacterized, for example by sequencing (i.e., determining thenucleotide sequence of the nucleic acid fragments), by methods describedbelow and others that are standard in the art (see, e.g., U.S. Pat. Nos.4,962,022 and 5,498,523, which are directed to methods of DNAsequencing).

Nucleic acid sequencing methods according to the invention may compriseone or more steps. For example, the invention may be combined with amethod for sequencing a nucleic acid molecule comprising (a) mixing anenzyme with polymerase activity with a nucleic acid molecule to besequenced, one or more primers, one or more nucleotides, and one or moreterminating agents (such as a dideoxynucleotides) to form a mixture; (b)incubating the mixture under conditions sufficient to synthesize apopulation of molecules complementary to all or a portion of themolecule to be sequenced; and (c) separating the population to determinethe nucleotide sequence of all or a portion of the molecule to besequenced.

Nucleic acid sequencing techniques which may be employed include dideoxysequencing methods such as those disclosed in U.S. Pat. Nos. 4,962,022and 5,498,523.

Kits

In another aspect, the invention provides kits which may be used inconjunction with the invention. Kits according to this aspect of theinvention may comprise one or more containers, which may contain one ormore components selected from the group consisting of one or morenucleic acid molecules or vectors of the invention, one or more primers,the molecules and/or compounds of the invention, supports of theinvention, one or more polymerases, one or more reverse transcriptases,one or more recombination proteins (or other enzymes for carrying outthe methods of the invention), one or more buffers, one or moredetergents, one or more restriction endonucleases, one or morenucleotides, one or more terminating agents (e.g., ddNTPs), one or moretransfection reagents, pyrophosphatase, and the like.

A wide variety of nucleic acid molecules or vectors of the invention canbe used with the invention. Further, due to the modularity of theinvention, these nucleic acid molecules and vectors can be combined inwide range of ways. Examples of nucleic acid molecules which can besupplied in kits of the invention include those that contain promoters,signal peptides, enhancers, repressors, selection markers, transcriptionsignals, translation signals, primer hybridization sites (e.g., forsequencing or PCR), recombination sites, restriction sites andpolylinkers, sites which suppress the termination of translation in thepresence of a suppressor tRNA, suppressor tRNA coding sequences,sequences which encode domains and/or regions (e.g., 6 His tag) for thepreparation of fusion proteins, origins of replication, telomeres,centromeres, and the like. Similarly, libraries can be supplied in kitsof the invention. These libraries may be in the form of replicablenucleic acid molecules or they may comprise nucleic acid molecules whichare not associated with an origin of replication. As one skilled in theart would recognize, the nucleic acid molecules of libraries, as well asother nucleic acid molecules, which are not associated with an origin ofreplication either could be inserted into other nucleic acid moleculeswhich have an origin of replication or would be an expendable kitcomponents.

Further, in some embodiments, libraries supplied in kits of theinvention may comprise two components: (1) the nucleic acid molecules ofthese libraries and (2) 5′ and/or 3′ recombination sites. In someembodiments, when the nucleic acid molecules of a library are suppliedwith 5′ and/or 3′ recombination sites, it will be possible to insertthese molecules into vectors, which also may be supplied as a kitcomponent, using recombination reactions. In other embodiments,recombination sites can be attached to the nucleic acid molecules of thelibraries before use (e.g., by the use of a ligase, which may also besupplied with the kit). In such cases, nucleic acid molecule whichcontain recombination sites or primers which can be used to generaterecombination sites may be supplied with the kits.

Vector supplied in kits of the invention can vary greatly. In mostinstances, these vectors will contain an origin of replication, at leastone selectable marker, and at least one recombination site. For example,vectors supplied in kits of the invention can have four separaterecombination sites which allow for insertion of nucleic acid moleculesat two different locations. A vector of this type is shown schematicallyin FIG. 6. Other attributes of vectors supplied in kits of the inventionare described elsewhere herein.

Kits of the invention can also be supplied with primers. These primerswill generally be designed to anneal to molecules having specificnucleotide sequences. For example, these primers can be designed for usein PCR to amplify a particular nucleic acid molecule. Further, primerssupplied with kits of the invention can be sequencing primers designedto hybridize to vector sequences. Thus, such primers will generally besupplied as part of a kit for sequencing nucleic acid molecules whichhave been inserted into a vector.

One or more buffers (e.g., one, two, three, four, five, eight, ten,fifteen) may be supplied in kits of the invention. These buffers may besupplied at a working concentrations or may be supplied in concentratedform and then diluted to the working concentrations. These buffers willoften contain salt, metal ions, co-factors, metal ion chelating agents,etc. for the enhancement of activities of the stabilization of eitherthe buffer itself or molecules in the buffer. Further, these buffers maybe supplied in dried or aqueous forms. When buffers are supplied in adried form, they will generally be dissolved in water prior to use.Examples of buffers suitable for use in kits of the invention are setout in the following examples.

Supports suitable for use with the invention (e.g., solid supports,semi-solid supports, beads, multi-well tubes, etc., described above inmore detail) may also be supplied with kits of the invention. Exemplaryuses of supports in processes of the invention are shown in FIGS. 10-13.

Kits of the invention may contain virtually any combination of thecomponents set out above or described elsewhere herein. As one skilledin the art would recognize, the components supplied with kits of theinvention will vary with the intended use for the kits. Thus, kits maybe designed to perform various functions set out in this application andthe components of such kits will vary accordingly.

It will be understood by one of ordinary skill in the relevant arts thatother suitable modifications and adaptations to the methods andapplications described herein are readily apparent from the descriptionof the invention contained herein in view of information known to theordinarily skilled artisan, and may be made without departing from thescope of the invention or any embodiment thereof. Having now describedthe present invention in detail, the same will be more clearlyunderstood by reference to the following examples, which are includedherewith for purposes of illustration only and are not intended to belimiting of the invention.

The entire disclosures of U.S. application Ser. No. 08/486,139 (nowabandoned), filed Jun. 7, 1995, U.S. application Ser. No. 08/663,002,filed Jun. 7, 1996 (now U.S. Pat. No. 5,888,732), U.S. application Ser.No. 09/233,492, filed Jan. 20, 1999, U.S. Pat. No. 6,143,557, issuedNov. 7, 2000, U.S. Appl. No. 60/065,930, filed Oct. 24, 1997, U.S.application Ser. No. 09/177,387 filed Oct. 23, 1998, U.S. applicationSer. No. 09/296,280, filed Apr. 22, 1999, U.S. application Ser. No.09/296,281, filed Apr. 22, 1999, U.S. Appl. No. 60/108,324, filed Nov.13, 1998,U.S. application Ser. No. 09/438,358, filed Nov. 12, 1999, U.S.application Ser. No. 09/695,065, filed Oct. 25, 2000, U.S. applicationSer. No. 09/432,085 filed Nov. 2, 1999, U.S. Appl. No. 60/122,389, filedMar. 2, 1999, U.S. Appl. No. 60/126,049, filed Mar. 23, 1999, U.S. Appl.No. 60/136,744, filed May 28, 1999, U.S. Appl. No. 60/122,392, filedMar. 2, 1999, and U.S. Appl. No. 60/161,403, filed Oct. 25, 1999, areherein incorporated by reference.

EXAMPLES Example 1

Simultaneous Cloning of Two Nucleic Acid Segments Using an LR Reaction

Two nucleic acid segments may be cloned in a single reaction usingmethods of the present invention. Methods of the present invention maycomprise the steps of providing a first nucleic acid segment flanked bya first and a second recombination site, providing a second nucleic acidsegment flanked by a third and a fourth recombination site, whereineither the first or the second recombination site is capable ofrecombining with either the third or the fourth recombination site,conducting a recombination reaction such that the two nucleic acidsegments are recombined into a single nucleic acid molecule and cloningthe single nucleic acid molecule.

With reference to FIG. 2, two nucleic acid segments flanked byrecombination sites may be provided. Those skilled in the art willappreciate that the nucleic acid segments may be provided either asdiscrete fragments or as part of a larger nucleic acid molecule and maybe circular and optionally supercoiled or linear. The sites can beselected such that one member of a reactive pair of sites flanks each ofthe two segments.

By “reactive pair of sites,” what is meant is two recombination sitesthat can, in the presence of the appropriate enzymes and cofactors,recombine. For example, in some preferred embodiments, one nucleic acidmolecule may comprise an attR site while the other comprises an attLsite that reacts with the attR site. As the products of an LR reactionare two molecules, one of which comprises an attB site and one of whichcomprises an attP site, it is possible to arrange the orientation of thestarting attL and attR sites such that, after joining, the two startingnucleic acid segments are separated by a nucleic acid sequence thatcomprises either an attB site or an attP site.

In some preferred embodiments, the sites may be arranged such that thetwo starting nucleic acid segments are separated by an attB site afterthe recombination reaction. In other preferred embodiments,recombination sites from other recombination systems may be used. Forexample, in some embodiments one or more of the recombination sites maybe a lox site or derivative. In some preferred embodiments,recombination sites from more than one recombination system may be usedin the same construct. For example, one or more of the recombinationsites may be an att site while others may be lox sites. Variouscombinations of sites from different recombination systems may occur tothose skilled in the art and such combinations are deemed to be withinthe scope of the present invention.

As shown in FIG. 2, nucleic acid segment A (DNA-A) may be flanked byrecombination sites having unique specificity, for example attL1 andattL3 sites and nucleic acid segment B (DNA-B) may be flanked byrecombination sites attR3 and attL2. For illustrative purposes, thesegments are indicated as DNA. This should not be construed as limitingthe nucleic acids used in the practice of the present invention to DNAto the exclusion of other nucleic acids. In addition, in this and thesubsequent examples, the designation of the recombination sites (i.e.,L1, L3, R1, R3, etc.) is merely intend to convey that the recombinationsites used have different specificities and should not be construed aslimiting the invention to the use of the specifically recited sites. Oneskilled in the art could readily substitute other pairs of sites forthose specifically exemplified.

The attR3 and attL3 sites comprise a reactive pair of sites. Other pairsof unique recombination sites may be used to flank the nucleic acidsegments. For example, lox sites could be used as one reactive pairwhile another reactive pair may be att sites and suitable recombinationproteins included in the reaction. Likewise, the recombination sitesdiscussed above can be used in various combinations. In this embodiment,the only critical feature is that, of the recombination sites flankingeach segment, one member of a reactive pair of sites, in this example anLR pair L3 and R3, is present on one nucleic acid segment and the othermember of the reactive pair is present on the other nucleic acidsegment. The two segments may be contacted with the appropriate enzymesand a Destination Vector.

The Destination Vector comprises a suitable selectable marker flanked bytwo recombination sites. In some embodiments, the selectable marker maybe a negative selectable marker (such as a toxic gene, e.g., ccdB). Onesite in the Destination Vector will be compatible with one site presenton one of the nucleic acid segments while the other compatible sitepresent in the Destination Vector will be present on the other nucleicacid segment.

Absent a recombination between the two starting nucleic acid segments,neither starting nucleic acid segment has recombination sites compatiblewith both the sites in the Destination Vector. Thus, neither startingnucleic acid segment can replace the selectable marker present in theDestination Vector.

The reaction mixture may be incubated at about 25° C. for from about 60minutes to about 16 hours. All or a portion of the reaction mixture willbe used to transform competent microorganisms and the microorganismsscreened for the presence of the desired construct.

In some embodiments, the Destination Vector comprises a negativeselectable marker and the microorganisms transformed are susceptible tothe negative selectable marker present on the Destination Vector. Thetransformed microorganisms will be grown under conditions permitting thenegative selection against microorganisms not containing the desiredrecombination product.

In FIG. 2, the resulting desired product consists of DNA-A and DNA-Bseparated by an attB3 site and cloned into the Destination Vectorbackbone. In this embodiment, the same type of reaction (i.e., an LRreaction) may be used to combine the two fragments and insert thecombined fragments into a Destination Vector.

In some embodiments, it may not be necessary to control the orientationof one or more of the nucleic acid segments and recombination sites ofthe same specificity can be used on both ends of the segment.

With reference to FIG. 2, if the orientation of segment A with respectto segment B were not critical, segment A could be flanked by L1 siteson both ends oriented as inverted repeats and the end of segment B to bejoined to segment A could be equipped with an R1 site. This might beuseful in generating additional complexity in the formation ofcombinatorial libraries between segments A and B. That is, the joiningof the segments can occur in various orientations and given that one orboth segments joined may be derived from one or more libraries, a newpopulation or library comprising hybrid molecules in random orientationsmay be constructed according to the invention.

Although, in the present examples, the recombination between the twostarting nucleic acid segments is shown as occurring before therecombination reactions with the Destination Vector, the order of therecombination reactions is not important. Thus, in some embodiments, itmay be desirable to conduct the recombination reaction between thesegments and isolate the combined segments. The combined segments can beused directly, for example, may be amplified, sequenced or used aslinear expression elements as taught by Sykes, et al. (NatureBiotechnology 17:355-359, 1999). In some embodiments, the joinedsegments may be encapsulated as taught by Tawfik, et al. (NatureBiotechnology 16:652-656, 1998) and subsequently assayed for one or moredesirable properties. In some embodiments, the combined segments may beused for in vitro expression of RNA by, for example, including apromoter such as the T7 promoter or SP6 promoter on one of the segments.Such in vitro expressed RNA may optionally be translated in an in vitrotranslation system such as rabbit reticulocyte lysate.

Optionally, the joined segments may be further reacted with aDestination Vector resulting in the insertion of the combined segmentsinto the vector. In some instances, it may be desirable to isolate anintermediate comprising one of the segments and the vector. Forinsertion of the segments into a vector, it is not critical to thepractice of the present invention whether the recombination reactionjoining the two segments occurs before or after the recombinationreaction between the segments and the Destination Vector.

According to the invention, all three recombination reactions preferablyoccur (i.e., the reaction between segment A and the Destination Vector,the reaction between segment B and the Destination Vector, and thereaction between segment A and segment B) in order to produce a nucleicacid molecule in which both of the two starting nucleic acid segmentsare now joined in a single molecule. In some embodiments, recombinationsites may be selected such that, after insertion into the vector, therecombination sites flanking the joined segments form a reactive pair ofsites and the joined segments may be excised from the vector by reactionof the flanking sites with suitable recombination proteins.

With reference to FIG. 2, if the L2 site on segment B were replaced byan L1 site in the opposite orientation with respect to segment B (i.e.,the long portion of the box indicating the recombination site was notadjacent to the segment) and the R2 site in the vector were replaced byan R1 site in opposite orientation, the recombination reaction wouldproduce an attP1 site in the vector. The attP 1 site would then becapable of reaction with the attB1 site on the other end of the joinedsegments. Thus, the joined segments could be excised using therecombination proteins appropriate for a BP reaction.

This embodiment of the invention is particularly suited for theconstruction of combinatorial libraries. In some preferred embodiments,each of the nucleic acid segments in FIG. 2 may represent libraries,each of which may have a known or unknown nucleic acid sequence to bescreened. In some embodiments, one or more of the segments may have asequence encoding one or more permutations of the amino acid sequence ofa given peptide, polypeptide or protein. In some embodiments, eachsegment may have a sequence that encodes a protein domain or a libraryrepresenting various permutations of the sequence of protein domain. Forexample, one segment may represent a library of mutated forms of thevariable domain of an antibody light chain while the other segmentrepresents a library of mutated forms of an antibody heavy chain. Thus,recombination would generate a population of molecules (e.g.,antibodies, single-chain antigen-binding proteins, etc.) eachpotentially containing a unique combination of sequences and, therefore,a unique binding specificity.

In other preferred embodiments, one of the segments may represent asingle nucleic acid sequence while the other represents a library. Theresult of recombination will be a population of sequences all of whichhave one portion in common and are varied in the other portion.Embodiments of this type will be useful for the generation of a libraryof fusion constructs. For example, DNA-A may comprise a regulatorysequence for directing expression (i.e., a promoter) and a sequenceencoding a purification tag. Suitable purification tags include, but arenot limited to, glutathione S-transferase (GST), the maltose bindingprotein (MBP), epitopes, defined amino acid sequences such as epitopes,haptens, six histidines (HIS6), and the like. DNA-B may comprise alibrary of mutated forms of a protein of interest. The resultantconstructs could be assayed for a desired characteristic such asenzymatic activity or ligand binding.

Alternatively, DNA-B might comprise the common portion of the resultingfusion molecule. In some embodiments, the above described methods may beused to facilitate the fusion of promoter regions or transcriptiontermination signals to the 5′-end or 3′-end of structural genes,respectively, to create expression cassettes designed for expression indifferent cellular contexts, for example, by adding a tissue-specificpromoter to a structural gene.

In some embodiments, one or more of the segments may represent asequence encoding members of a random peptide library. This approachmight be used, for example, to generate a population of molecules with acertain desirable characteristic. For example, one segment might containa sequence coding for a DNA binding domain while the other segmentrepresents a random protein library. The resulting population might bescreened for the ability to modulate the expression of a target gene ofinterest. In other embodiments, both segments may represent sequencesencoding members of a random protein library and the resultant syntheticproteins (e.g., fusion proteins) could be assayed for any desirablecharacteristic such as, for example, binding a specific ligand orreceptor or possessing some enzymatic activity.

It is not necessary that the nucleic acid segments encode an amino acidsequence. For example, both of the segments may direct the transcriptionof an RNA molecule that is not translated into protein. This will beuseful for the construction of tRNA molecules, ribozymes and anti-sensemolecules. Alternatively, one segment may direct the transcription of anuntranslated RNA molecule while the other codes for a protein. Forexample, DNA-A may direct the transcription of an untranslated leadersequence that enhances protein expression such as theencephalomyocarditis virus leader sequence (EMC leader) while DNA-Bencodes a peptide, polypeptide or protein of interest. In someembodiments, a segment comprising a leader sequence might furthercomprise a sequence encoding an amino acid sequence. For example, DNA-Amight have a nucleic acid sequence corresponding to an EMC leadersequence and a purification tag while DNA-B has a nucleic acid sequenceencoding a peptide, polypeptide or protein of interest.

The above process is especially useful for the preparation ofcombinatorial libraries of single-chain antigen-binding proteins.Methods for preparing single-chain antigen-binding proteins are known inthe art. (See, e.g., PCT Publication No. WO 94/07921, the entiredisclosure of which is incorporated herein by reference.) Using theconstructs shown in FIG. 6 for illustration, DNA-A could encode, forexample, mutated forms of the variable domain of an antibody light chainand DNA-B could encode, for example, mutated forms of the variabledomain of an antibody light chain. Further, the intervening nucleic acidbetween DNA-A and DNA-B could encode a peptide linker for connecting thelight and heavy chains. Cells which express the single-chainantigen-binding proteins can then be screened to identify those whichproduce molecules that bind to a particular antigen.

Numerous variation of the above are possible. For example, instead ofusing a construct illustrated in FIG. 6, a constructs such as thatillustrated in FIG. 2 could be used with the linker peptide codingregion being embedded in the recombination site. This is one an exampleof recombination site embedded functionality discussed above.

As another example, single-chain antigen-binding proteins composed oftwo antibody light chains and two antibody heavy chains can also beproduced. These single-chain antigen-binding proteins can be designed toassociate and form multivalent antigen binding complexes. Using theconstructs shown in FIG. 2 again for illustration, DNA-A and DNA-B couldeach encode, for example, mutated forms of the variable domain of anantibody light chain. At the same site in a similar vector or at anothersite in a vector which is designed for the insertion of four nucleicacid inserts, DNA-A and DNA-B could each encode, for example, mutatedforms of the variable domain of an antibody heavy chain. Cells whichexpress both single-chain antigen-binding proteins could then bescreened to identify, for example, those which produce multivalentantigen-binding complexes having specificity for a particular antigen.

Thus, the methods of the invention can be used, for example, to prepareand screen combinatorial libraries to identify cells which produceantigen-binding proteins (e.g., antibodies and/or antibody fragments orantibody fragment complexes comprising variable heavy or variable lightdomains) having specificities for particular epitopes. The methods ofthe invention also methods for preparing antigen-binding proteins andantigen-binding proteins prepared by the methods of the invention.

Example 2 Simultaneous Cloning of Two Nucleic Acid Fragments Using an LRReaction to Join the Segments and a BP Reaction to Insert the Segmentsinto a Vector

As shown in FIG. 3, a first nucleic acid segment flanked by an attBrecombination site and an attL recombination site may be joined to asecond nucleic acid segment flanked by an attR recombination site thatis compatible with the attL site present on the first nucleic acidsegment and flanked by an attB site that may be the same or different asthe attB site present on the first segment. FIG. 3 shows an embodimentwherein the two attB sites are different. The two segments may becontacted with a vector containing attP sites in a BP reaction.

A subsequent LR reaction would generate a product consisting of DNA-Aand DNA-B separated by either an attP site or an attB site (the productof the LR reaction) and cloned into the vector backbone. In theembodiment shown in FIG. 3, the attL and attR sites are arranged so asto generate an attB site between the segments upon recombination. Inother embodiments, the attL and the attR may be oriented differently soas to produce an attP site between the segments upon recombination. Inpreferred embodiments, after recombination, the two segments may beseparated by an attB site.

Those skilled in the art can readily optimize the conditions forconducting the reactions described above without the use of undueexperimentation. In a typical reaction from about 50 ng to about 1000 ngof vector may be contacted with the fragments to be cloned undersuitable reaction conditions. Each fragment may be present in a molarratio of from about 25:1 to about 1:25 vector:fragment. In someembodiments, one or more of the fragments may be present at a molarratio of from about 10:1 to 1:10 vector:fragment. In a preferredembodiment, each fragment may be present at a molar ratio of about 1:1vector:fragment.

Typically, the nucleic acid may be dissolved in an aqueous buffer andadded to the reaction mixture. One suitable set of conditions is 4 μlCLONASE™ enzyme mixture (e.g., Invitrogen Corp. (Carlsbad, Calif.), Cat.Nos. 11791-019 and 11789-013), 4 μl 5× reaction buffer and nucleic acidand water to a final volume of 20 μl. This will typically result in theinclusion of about 200 ng of Int and about 80 ng of IHF in a 20 μl BPreaction and about 150 ng Int, about 25 ng IHF and about 30 ng X is in a20 pt LR reaction.

In some preferred embodiments, particularly those in which attL sitesare to be recombined with attR sites, the final reaction mixture mayinclude about 50 mM Tris HCl, pH 7.5, about 1 mM EDTA, about 1 mg/mlBSA, about 75 mM NaCl and about 7.5 mM spermidine in addition torecombination enzymes and the nucleic acids to be combined. In otherpreferred embodiments, particularly those in which an attB site is to berecombined with an attP site, the final reaction mixture may includeabout 25 mM Tris HCl, pH 7.5, about 5 mM EDTA, about 1 mg/ml bovineserum albumin (BSA), about 22 mM NaCl, and about 5 mM spermidine.

When it is desired to conduct both a BP and an LR reaction withoutpurifying the nucleic acids in between, the BP reaction can be conductedfirst and then the reaction conditions adjusted to about 50 mM NaCl,about 3.8 mM spermidine, about 3.4 mM EDTA and about 0.7 mg/ml by theaddition of the LR CLONASE™ enzymes and concentrated NaCl. The reactionsolution may be incubated at suitable temperature such as, for example,25° C. for from about 60 minutes to 16 hours. After the recombinationreaction, the solution may be used to transform competent host cells andthe host cells screened as described above.

One example of a “one-tube” reaction protocol, which facilitates thetransfer of PCR products directly to Expression Clones in a two-stepreaction performed in a single tube follows. This process can also beused to transfer a gene from one Expression Clone plasmid backbone toanother. The Expression Clone is first be linearized within the plasmidbackbone to achieve the optimal topology for the BP reaction and toeliminate false-positive colonies due to co-transformation.

Twenty-five μl BP reaction mixture is prepared in a 1.5 ml tube with thefollowing components:

attB DNA(100-200 ng) 1-12.5 μl attP DNA (pDONR201) 150 ng/μl 2.5 μl BPReaction Buffer 5.0 μl TE to 20 μl BP Clonase 5.0 μl Total vol. 25 μl

The contents of the tube is mixed and incubated for 4 hours, or longer,at 25° C. If the PCR product is amplified from a plasmid templatecontaining selectable markers present on the GATEWAY™ pDONR or pDESTvectors (i.e., kand or ampr), the PCR product may be treated with therestriction endonuclease DpnI to degrade the plasmid. Such plasmids area potential source of false-positive colonies in the transformation ofGATEWAY™ reactions. Further, when the template for PCR or startingExpression Clone has the same selectable marker as the final DestinationVector (e.g., amp^(r)), plating on LB plates containing 100 μg/mlampicillin can be used to determine the amount of false positivecolonies carried over to the LR reaction step.

Five μl of the reaction mixture is transferred to a separate tube towhich is added 0.5 μl Proteinase K Solution. This tube is then incubatefor 10 minutes at 37° C. One hundred μl of competent cells are thentransformed with 1-2 μl of the mixture and plated on LB platescontaining 50 μg/ml kanamycin. This yields colonies for isolation ofindividual Entry Clones and for assessment of the BP Reactionefficiency.

The following components are added to the remaining 20 μl BP reactiondescribed above:

NaCl 0.75M 1 μl Destination 150 ng/μl 3 μl Vector LR Clonase 6 μl Totalvol. 30 μl

The mixture is then incubate at 25° C. for 2 hours, after which 3 μl ofproteinase K solution, followed by a further incubation of 10 minutes at37° C. 1-2 μl of this mixtures is then used to transform 100 μlcompetent cells, which are then plated on LB plates containing 100 μg/mlampicillin.

Example 3 Cloning of PCR Products Using Fragments by Converting attBSites into a Reactive Pair of attL and attR Sites in a BP Reaction andSubsequent LR Reaction

A similar strategy to that described in Example 2 can be used torecombine two PCR products and clone them simultaneously into a vectorbackbone. Since attL and attR sites are 100 and 125 base pairs long,respectively, it may be desirable to incorporate attB sites into the PCRprimers since an attB site is 25 base pairs in length. Depending on theorientation of the attB site with respect to the nucleic acid segmentbeing transferred, attB sites can be converted to either an attL or attRsite by the BP reaction. Thus, the orientation of the attB site in theattB PCR primer determines whether the attB site is converted to attL orattR. This affords the GATEWAY™ system and methods of the inventiongreat flexibility in the utilization of multiple att sites with uniquespecificity.

As shown in FIG. 4, two segments (e.g., PCR products) consisting ofsegment A flanked by mutated attB sites each having a differentspecificity (e.g., by attB1 and attB3) and segment B flanked by mutatedattB sites of different specificity, wherein one of the attB sitespresent on segment A is the same as one of the attB sites present onsegment B (e.g. segment B may contain attB3 and attB2 sites) may bejoined and inserted into a vector. The segments may be reacted eitherindividually or together with two attP site containing vectors in a BPreaction. Alternatively, the attP sites might be present on linearsegments. One vector contains attP sites compatible with the attB sitespresent on segment A (e.g., attP1 and attP3 sites). The other vectorcontains attP sites compatible with the attB sites present on segment B(e.g., attP3 and attP2 sites). When linear segments are used to providethe attP sites, each attP site may be provided on a segment. Theorientations of the attB3 and attP3 sites are such that an attR3 sitewould be generated at the 5′-end of the DNA-B segment and an attL3 sitegenerated at the 3′-end of segment A. The resulting entry clones aremixed with a Destination Vector in a subsequent LR reaction to generatea product consisting of DNA-A and DNA-B separated by an attB3 site andcloned into the Destination Vector backbone.

This basic scheme has been used to link two segments, an attL1-fragmentA-attL3 entry clone that is reacted with an attR3-fragment B-attL2 entryclone, and to insert the linked fragments into the destination vector.To generate the appropriate entry clones, two attP Donor vectors wereconstructed consisting of attP1-ccdB-attP3 and attP3R-ccdB-attP2 suchthat they could be reacted with appropriate attB PCR products in orderto convert the attB sites to attL and attR sites. The designation attP3Ris used to indicated that the orientation of the attP3 site is such thatreaction with a DNA segment having a cognate attB site will result inthe production of an attR site on the segment. This is representedschematically in FIG. 4 by the reversed orientation of the stippled andlined sections of the attB3 on segment B as compared to segment A. Onsegment B the stippled portion is adjacent to the segment while onsegment A the lined portion is adjacent to the segment.

This methodology was exemplified by constructing a DNA segment in whichthe tetracycline resistance gene (tet) was recombined with theP-galactosidase gene such that the two genes were separated by an attBsite in the product. The tet gene was PCR amplified with 5′-attB1 and3′-attB3 ends. The lacZ gene was PCR amplified with 5′-attB3R and3′-attB2 ends. The two PCR products were precipitated with polyethyleneglycol (PEG). The B1-tet-B3 PCR product was mixed with anattP1-ccdB-attP3 donor vector and reacted with BP CLONASE™using astandard protocol to generate an attL1-tet-attL3 entry clone. A correcttet entry clone was isolated and plasmid DNA prepared using standardtechniques. In a similar fashion, the attB3R-lacZ-attB2 PCR product wasmixed with an attP3R-ccdB-attP2 donor vector and reacted with BPCLONASE™ to generate an attR3-lacZ-attL2 entry clone.

In order to join the two segments in a single vector, an LRCLONASE™reaction was prepared in a reaction volume of 20 μl containingthe following components: 60 ng (25 fmoles) of the supercoiled tet entryclone; 75 ng (20 fmoles) of the supercoiled lacZ entry clone; 150 ng (35fmoles) of pDEST6 (described in PCT Publication WO 00/52027, the entiredisclosure of which is incorporated herein by reference) linearized withNcoI; 4 μl reaction buffer and 4 μl of LR CLONASE™. The final reactionmixture contained 51 mM Tris.HCl, 1 mM EDTA, 1 mg/ml BSA, 76 mM

NaCl, 7.5 mM spermidine, 160 ng of Int, 35 ng of IHF and 35 ng of X is.The reaction was incubated at 25° C. overnight and stopped with 2 μl ofproteinase K solution (2 mg/ml). A 2 μl aliquot was used to transform100 μl of E. coli DH5α LE cells and plated on LB plates containingampicillin and XGal. Approximately 35,000 colonies were generated in thetransformation mixture with cells at an efficiency of 1.6×10⁸ cful/μg ofpUC DNA. All the colonies appeared blue indicating the presence of thelacZ gene. 24 colonies were streaked onto plates containing tetracyclineand XGal. All of the colonies tested, 24/24, were resistant totetracycline. 12 colonies were used to inoculate 2 ml of LB brothcontaining ampicillin for mini preps. 12/12 minipreps contained asupercoiled plasmid of the correct size (7 kb).

In some embodiments, such as that shown in FIG. 5, two segments can bereacted with a vector containing a single recombination site in order toconvert one of the recombination sites on the segments into a differentrecombination site. In some embodiments, segments containing attB sitesmay be reacted with a target vector having attP sites. For example,segments A and B are reacted either together or separately with a vectorhaving an attP3 site in order to convert the attB3 sites on the segmentsinto an attL3 and an attR3, respectively. This is done so that thesubsequent LR reaction between the two segments results in their beingjoined by an attB site. The segments may be joined with the attP sitecontaining vector before, simultaneously with or after the recombinationreaction to convert the sites to generate a co-integrate moleculeconsisting of DNA-A flanked by attL1 and attL3 and DNA-B flanked byattR3 and attL2. A subsequent LR reaction will generate a product cloneconsisting of DNA-A and DNA-B separated by attB3 cloned into a vectorbackbone.

In some embodiments, an attP site designed to convert the attB used tolink the segments to a reactive pair of attL and attR sites may beprovided as shorter segments such as restriction fragments, duplexes ofsynthetic oligonucleotides or PCR fragments. Reactions involving alinear fragment in a BP reaction may require longer incubation times,such a overnight incubation.

The conversion of attB sites to attL or attR sites can also beaccomplished solely by PCR. PCR primers containing attL or attR sitescan be used to amplify a segment having an attB site on the end. Sincethe sequence of attL and attR sites contains a portion of the sequenceof an attB site, the attB site in this case serves as an overlap regionto which the attL or attR PCR primer can anneal. Extension of theannealed attL or attR primer through to the end of the PCR product willgenerate a fusion template for PCR amplification of the full length PCRproduct using flanking primers that anneal to the ends of the attL orattR sites. The primers for the PCR reaction may be provided as singlestranded oligonucleotides. In some preferred embodiments, the primersmay be provided as a duplex, for example, as the product of a PCRreaction to amplify either an attL or attR site.

Example 4 Cloning of Two or More Nucleic Acid Fragments into DifferentPlaces in the Same Vector

Two or more nucleic acid fragments can be cloned simultaneously intodifferent regions of a vector having multiple sets of recombinationsites each flanking a selectable marker. In some embodiments, one ormore of the selectable markers may be a negative selectable marker.

As shown in FIG. 6, two nucleic acid segments A and B which may bepresent as discrete fragments or as part of a larger nucleic acidmolecule such as a plasmid, can be simultaneously cloned into the samedestination vector. Nucleic acid segment A (DNA-A) flanked byrecombination sites that do not recombine with each other (e.g., attL1and attL2) and nucleic acid segment B (DNA-B) flanked by recombinationsites that do not recombine with each other and do not recombine withthe sites flanking segment A (e.g., attL3 and attL4) may be combinedwith a Destination Vector in an LR reaction. The Destination Vector willcontain two pairs of recombination sites, each pair selected torecombine with the sites flanking one of the segments. As an example,FIG. 6 shows two pairs of attR sites (attR1/attR2 and attR3/attR4) eachflanking a ccdB negative selectable marker. The three nucleic acids canbe combined in a single LR reaction. The resulting product will consistof DNA-A and DNA-B flanked by pairs of attB sites and cloned intodistinct regions of the Destination Vector.

As shown in FIG. 7, an analogous method for inserting nucleic acidsegments into a vector can be accomplished using a BP reaction. Forexample, DNA-A flanked by recombination sites attB1 and attB2 can becombined with DNA-B flanked by recombination sites attB3 and attB4 and avector containing attP sites in a BP reaction. The resulting productwould consist of DNA-A and DNA-B cloned between pairs of attL sites intodistinct regions of the vector. In some embodiments, it may be desirableto insert the segments into the target vector sequentially and isolatean intermediate molecule comprising only one of the segments.

It is not necessary that all of the sites be derived form the samerecombination system. For example, one segment may be flanked by loxsites while the other segment is flanked by att sites. A segment mayhave a lox site on one end and an att site on the other end or anfrtsite on one end. Various combinations of sites may be envisioned bythose skilled in the art and such combinations are within the scope ofthe present invention.

In some embodiments, it may be desirable to isolate intermediates in thereaction shown in FIGS. 6 and 7. For example, it may be desirable toisolate a vector having only one of the segments inserted. Theintermediate might be used as is or might serve as the substrate in asubsequent recombination reaction to insert the second segment.

In some embodiments, the present invention is a method of cloning nnucleic acid segments, wherein n is an integer greater than 1,comprising the steps of providing n nucleic acid segments, each segmentflanked by two unique recombination sites, providing a vector comprising2n recombination sites wherein each of the 2n recombination sites iscapable of recombining with one of the recombination sites flanking oneof the nucleic acid segments and conducting a recombination reactionsuch that the n nucleic acid segments are recombined into the vectorthereby cloning the n nucleic acid segments. In further embodiments, thevector comprises n copies of a selectable marker each copy flanked bytwo recombination sites. In other embodiments, the vector comprises twoor more different selectable markers each flanked by two recombinationsites. In some embodiments, one or more of the selectable markers may bea negative selectable marker.

In some embodiments, the present invention provides a method of cloning,comprising the steps of providing a first, a second and a third nucleicacid segment, wherein the first nucleic acid segment is flanked by afirst and a second recombination site, the second nucleic acid segmentis flanked by a third and a fourth recombination site and the thirdnucleic acid segment is flanked by a fifth and a sixth recombinationsite, wherein the second recombination site is capable of recombiningwith the third recombination site and none of the first, fourth, fifthor sixth recombination sites is capable of recombining with any of thefirst through sixth recombination sites, providing a vector comprising aseventh and an eighth recombination site flanking a first selectablemarker and comprising a ninth and a tenth recombination site flanking asecond selectable marker wherein none of the seventh through tenthrecombination sites can recombine with any of the seventh through tenthrecombination sites, conducting a first recombination reaction such thatthe second and the third recombination sites recombine and conducting asecond recombination reaction such that the first and the fourthrecombination sites recombine with the seventh and the eighthrecombination sites respectively and the fifth and the sixthrecombination sites recombine with the ninth and the tenth recombinationsites thereby cloning the first, second and third nucleic acid segments.

In some embodiments, a nucleic acid segment may comprise a sequence thatfunctions as a promoter. In some embodiments, the first and the secondnucleic acid segments may comprise a sequence encoding a polypeptide andthe recombination places both polypeptides in the same reading frame. Insome embodiments, a nucleic acid segment may comprise a sequence thatfunctions as a transcription termination sequence.

The present invention provides an extremely versatile method for themodular construction of nucleic acids and proteins. Both the insertednucleic acid segments and the vector can contain sequences selected soas to confer desired characteristics on the product molecules. In thoseembodiments exemplified in FIGS. 6 and 7, in addition to the insertedsegments, one or more of the portions of the vector adjacent to theinserted segments as well as the portion of the vector separating theinserted segments can contain one or more selected sequences.

In some embodiments, the selected sequences might encode ribozymes,epitope tags, structural domains, selectable markers, internal ribosomeentry sequences, promoters, enhancers, recombination sites and the like.In some preferred embodiments, the portion of the vector separating theinserted segments may comprise one or more selectable markers flanked bya reactive pair of recombination sites in addition to the recombinationsites used to insert the nucleic acid segments.

This methodology will be particularly well suited for the constructionof gene targeting vectors. For example, the segment of the vectorbetween the pairs of recombination sites may encode one or more aselectable markers such as the neomycin resistance gene. Segments A andB may contain nucleic acid sequences selected so as to be identical orsubstantially identical to a portion of a gene target that is to bedisrupted. After the recombination reaction, the Destination Vector willcontain two portions of a gene of interest flanking a positiveselectable marker. The vector can then be inserted into a cell using anyconventional technology, such as transfection, whereupon the portions ofthe gene of interest present on the vector can recombine with thehomologous portions of the genomic copy of the gene. Cells containingthe inserted vector can be selected based upon one or morecharacteristics conferred by the selectable marker, for example, in thecase when the selectable marker is the neomycin resistance gene, theirresistance to G-418.

In some embodiments, one or more a negative selectable markers may beincluded in the portion of the Destination Vector that does not containthe target gene segments and the positive selectable marker. Thepresence of one or more negative selectable markers permits theselection against cells in which the entire Destination Vector wasinserted into the genome or against cells in which the DestinationVector is maintained extrachromosomally.

In some preferred embodiments, additional recombination sites may bepositioned adjacent to the recombination sites used to insert thenucleic acid segments. Molecules of this type will be useful in genetargeting application where it is desirable to remove the selectablemarker from the targeted gene after targeting, the so called “hit andrun” methodology. Those skilled in the art will appreciate that thesegments containing homologous sequence need not necessarily correspondto the sequence of a gene. In some instances, the sequences may beselected to be homologous to a chromosomal location other than a gene.

This methodology is also well suited for the construction ofbi-cistronic expression vectors. In some embodiments, expression vectorscontaining bi-cistronic expression elements where two structural genesare expressed from a single promoter and are separated by an internalribosome entry sequence (IRES, see Encamaci6n, Current Opinion inBiotechnology 10:458-464 (1999), specifically incorporated herein byreference). Such vectors can be used to express two proteins from asingle construct.

In some embodiments, it may not be necessary to control the orientationof one or more of the nucleic acid segments and recombination sites ofthe same specificity can be used on both ends of the segment. Withreference to FIG. 6, if the orientation of segment A with respect tosegment B were not critical, segment A could be flanked by L1 sites onboth ends and the vector equipped with two R1 sites. This might beuseful in generating additional complexity in the formation ofcombinatorial libraries between segments A and B.

Example 5 Combining Multiple Fragments into a Single Site in a Vector

In some embodiments, the present invention provides a method of cloningn nucleic acid segments, wherein n is an integer greater than 1,comprising the steps of providing a 1^(st) through an n^(th) nucleicacid segment, each segment flanked by two unique recombination sites,wherein the recombination sites are selected such that one of the tworecombination sites flanking the i^(th) segment, n_(i), reacts with oneof the recombination sites flanking the n_(i−1th) segment and the otherrecombination site flanking the i_(th) segment reacts with one of therecombination sites flanking the n_(i+1th) segment, providing a vectorcomprising at least two recombination sites wherein one of the tworecombination sites on the vector reacts with one of the sites on the1^(st) nucleic acid segment and another site on the vector reacts with arecombination site on the n^(th) nucleic acid segment. It is a furtherobject of the present invention to provide a method of cloning,comprising the steps of providing a first, a second and a third nucleicacid segment, wherein the first nucleic acid segment is flanked by afirst and a second recombination site, the second nucleic acid segmentis flanked by a third and a fourth recombination site and the thirdnucleic acid segment is flanked by a fifth and a sixth recombinationsite, wherein the second recombination site is capable of recombiningwith the third recombination site and the fourth recombination site iscapable of recombining with the fifth recombination site, providing avector having at least a seventh and an eighth recombination site suchthat the seventh recombination site is capable of reacting with thefirst recombination site and the eighth recombination site is capable ofreacting with the sixth recombination site and conducting at least onerecombination reaction such that the second and the third recombinationsites recombine, the fourth and the fifth recombination sites recombine,the first and the seventh recombination sites recombine and the sixthand the eighth recombination sites recombine thereby cloning the first,second and third nucleic acid segments. In some embodiments, at leastone nucleic acid segment comprises a sequence that functions as apromoter.

In some embodiments, at least two nucleic acid segments comprisesequences encoding a polypeptide and the recombination places bothpolypeptides in the same reading frame. In some embodiments, at leastone nucleic acid segment comprises a sequence that functions as atranscription termination sequence. In some embodiments, at least onefragment comprises an origin of replication. In some embodiments, atleast one fragment comprises a sequence coding for a selectable marker.

This embodiment is exemplified in FIGS. 8 and 9 for the case when n=3.In this embodiment, the present invention provides a method of cloning,comprising the steps of providing a first, a second and a third nucleicacid segment, wherein the first nucleic acid segment is flanked by afirst and a second recombination site, the second nucleic acid segmentis flanked by a third and a fourth recombination site and the thirdnucleic acid segment is flanked by a fifth and a sixth recombinationsite, wherein the second recombination site is capable of recombiningwith the third recombination site and the fourth recombination site iscapable of recombining with the fifth recombination site, providing avector comprising a seventh and an eighth recombination site andconducting at least one recombination reaction such that the second andthe third recombination sites recombine and the fourth and the fifthrecombination sites recombine and the first and the sixth recombinationsites recombine with the seventh and the eighth recombination sitesrespectively, thereby cloning the first, second and third nucleic acidsegments.

As discussed above, when the orientation of a given segment is notcritical, the invention may be modified by placing recombination siteshaving the same specificity on both ends of the given segment andadjusting the recombination sites of the adjacent segments and/or therecombination sites in the vector accordingly.

In addition to the utilities discussed above for the combination of twofragments in a single vector, embodiments of this type will be usefulfor the construction of vectors from individual fragments containingvarious functions. Thus, the invention provides a modular method for theconstruction of vectors.

In some embodiments, at least one nucleic acid segment comprises asequence that functions as a promoter. In some embodiments, at least twonucleic acid segments comprise a sequence encoding a polypeptide and therecombination places both polypeptides in the same reading frame. Insome embodiments, at least one nucleic acid segment comprises a sequencethat functions as a transcription termination sequence. In someembodiments, at least one fragment comprises an origin of replication.In some embodiments, at least one fragment comprises a sequence codingfor a selectable marker. In some embodiments, a fragment may comprisesequence coding for more than one function. In some embodiments, afragment may comprise sequence coding for an origin of replication andsequence encoding a selectable marker.

When multiple nucleic acid segments are inserted into vectors usingmethods of the invention, expression of these segments may be driven bythe same regulatory sequence or different regulatory sequences. FIG. 20Ashows one example of a vector which contains two inserted DNA segments,the expression of which is driven by different promoters (i.e., twodifferent T7 promoters).

The methods of the invention may also be used to produce constructswhich allow for silencing of genes in vivo. One method of silencinggenes involves the production of involves the production ofdouble-stranded RNA, termed RNA interference (RNAi). (See, e.g., Metteet al., EMBO J, 19:5194-5201 (2000)). Methods of the invention can beused in a number of ways to produce molecules such as RNAi. Thus,expression products of nucleic acid molecules of the invention can beused to silence gene expression.

Nucleic acid molecules of the invention may be prepared to generateinterfering RNAs (RNAi). RNAi is double-stranded RNA that results indegradation of specific mRNAs, and can also be used to lower oreliminate gene expression. Nucleic acid molecules of the invention maybe engineered, for example, to produce dsRNA molecules by, for example,engineering nucleic acid molecules to have a sequence that, whentranscribed, folds back upon itself to generate a hairpin moleculecontaining a double-stranded portion. One strand of the double-strandedportion may correspond to all or a portion of the sense strand of themRNA transcribed from the gene to be silenced while the other strand ofthe double-stranded portion may correspond to all or a portion of theantisense strand. Other methods of producing a double-stranded RNAmolecule may be used, for example, nucleic acid molecules may beengineered to have a first sequence that, when transcribed, correspondsto all or a portion of the sense strand of the mRNA transcribed from thegene to be silenced and a second sequence that, when transcribed,corresponds to all or portion of an antisense strand (i.e., the reversecomplement) of the mRNA transcribed from the gene to be silenced. Thismay be accomplished by putting the first and the second sequence on thesame strand of the vector each under the control of its own promoter.Alternatively, two promoters may be positioned on opposite strands ofthe vector such that expression from each promoter results intranscription of one strand of the double-stranded RNA. In someembodiments, it may be desirable to have the first sequence on onenucleic acid molecule and the second sequence on a second nucleic acidmolecule and to introduce both vectors or molecules into a cellcontaining the gene to be silenced. In other embodiments, a nucleic acidmolecule containing only the antisense strand may be introduced and themRNA transcribed from the gene to be silenced may serve as the otherstrand of the double-stranded RNA. In some embodiments, a dsRNA to beused to silence a gene may have one or more regions of homology to agene to be silenced. Regions of homology may be from about 20 bp toabout 5 kbp in length, 20 bp to about 4 kbp in length, 20 bp to about 3kbp in length, 20 bp to about 2.5 kbp in length, from about 20 bp toabout 2 kbp in length, 20 bp to about 1.5 kbp in length, from about 20bp to about 1 kbp in length, 20 bp to about 750 bp in length, from about20 bp to about 500 bp in length, 20 bp to about 400 bp in length, 20 bpto about 300 bp in length, 20 bp to about 250 bp in length, from about20 bp to about 200 bp in length, from about 20 bp to about 150 bp inlength, from about 20 bp to about 100 bp in length, from about 20 bp toabout 90 bp in length, from about 20 bp to about 80 bp in length, fromabout 20 bp to about 70 bp in length, from about 20 bp to about 60 bp inlength, from about 20 bp to about 50 bp in length, from about 20 bp toabout 40 bp in length, from about 20 bp to about 30 bp in length, fromabout 20 bp to about 25 bp in length, from about 15 bp to about 25 bp inlength, from about 17 bp to about 25 bp in length, from about 19 bp toabout 25 bp in length, from about 19 bp to about 23 bp in length, orfrom about 19 bp to about 21 bp in length.

As discussed above, a hairpin containing molecule having adouble-stranded region may be used as RNAi. The length of the doublestranded region may be from about 20 bp to about 2.5 kbp in length, fromabout 20 bp to about 2 kbp in length, 20 bp to about 1.5 kbp in length,from about 20 bp to about 1 kbp in length, 20 bp to about 750 bp inlength, from about 20 bp to about 500 bp in length, 20 bp to about 400bp in length, 20 bp to about 300 bp in length, 20 bp to about 250 bp inlength, from about 20 bp to about 200 bp in length, from about 20 bp toabout 150 bp in length, from about 20 bp to about 100 bp in length, 20bp to about 90 bp in length, 20 bp to about 80 bp in length, 20 bp toabout 70 bp in length, 20 bp to about 60 bp in length, 20 bp to about 50bp in length, 20 bp to about 40 bp in length, 20 bp to about 30 bp inlength, or from about 20 bp to about 25 bp in length. Thenon-base-paired portion of the hairpin (i.e., loop) can be of any lengththat permits the two regions of homology that make up thedouble-stranded portion of the hairpin to fold back upon one another.

Any suitable promoter may be used to control the production of RNA fromthe nucleic acid molecules of the invention. Promoters may be thoserecognized by any polymerase enzyme. For example, promoters may bepromoters for RNA polymerase II or RNA polymerase III (e.g., a U6promoter, an HI promoter, etc.). Other suitable promoters include, butare not limited to, T7 promoter, cytomegalovirus (CMV) promoter, mousemammary tumor virus (MMTV) promoter, metalothionine, RSV (Rous sarcomavirus) long terminal repeat, SV40 promoter, human growth hormone (hGH)promoter. Other suitable promoters are known to those skilled in the artand are within the scope of the present invention.

One example of a construct designed to produce RNAi is shown in FIG.20B. In this construct, a DNA segment is inserted into a vector suchthat RNA corresponding to both strands are produced as two separatetranscripts. Another example of a construct designed to produce RNAi isshown in FIG. 20C. In this construct, two copies of a DNA segment areinserted into a vector such that RNA corresponding to both strands areagain produced. Yet another example of a construct designed to produceRNAi is shown in FIG. 20D. In this construct, two copies of a DNAsegment are inserted into a vector such that RNA corresponding to bothstrands are produced as a single transcript. The exemplary vector systemshown in shown in FIG. 20E comprises two vectors, each of which containcopies of the same DNA segment. Expression of one of these DNA segmentsresults in the production of sense RNA while expression of the otherresults in the production of an anti-sense RNA. RNA strands producedfrom vectors represented in FIGS. 20B-20E will thus have complementarynucleotide sequences and will generally hybridize either to each orintramolecularly under physiological conditions.

Nucleic acid segments designed to produce RNAi, such as the vectorsrepresented in FIGS. 20B-20E, need not correspond to the full-lengthgene or open reading frame. For example, when the nucleic acid segmentcorresponds to an ORF, the segment may only correspond to part of theORF (e.g., 50 nucleotides at the 5′ or 3′ end of the ORF). Further,while FIGS. 20B-20E show vectors designed to produce RNAi, nucleic acidsegments may also perform the same function in other forms (e.g., wheninserted into the chromosome of a host cell).

Gene silencing methods involving the use of compounds such as RNAi andantisense RNA, for examples, are particularly useful for identifyinggene functions. More specifically, gene silencing methods can be used toreduce or prevent the expression of one or more genes in a cell ororganism. Phenotypic manifestations associated with the selectiveinhibition of gene functions can then be used to assign role to the“silenced” gene or genes. As an example, Chuang et al., Proc. Natl.Acad. Sci. (USA) 97:4985-4990 (2000), have demonstrated that in vivoproduction of RNAi can alter gene activity in Arabidopsis thaliana.Thus, the invention provides methods for regulating expression ofnucleic acid molecules in cells and tissues comprising the expression ofRNAi and antisense RNA. The invention further provides methods forpreparing nucleic acid molecules which can be used to produce RNAcorresponding to one or both strands of a DNA molecule.

Similarly, the invention relates to compounds and methods for genesilencing involving ribozymes. In particular, the invention providesantisense RNA/ribozymes fusions which comprise (1) antisense RNAcorresponding to a target gene and (2) one or more ribozymes whichcleave RNA (e.g., hammerhead ribozyme, hairpin ribozyme, delta ribozyme,Tetrahymena L-21 ribozyme, etc.). Further, provided by the invention arevectors which express these fusions, methods for producing thesevectors, and methods for using these vector to suppress gene expression.

In one embodiment, a Destination Vector is constructed which encodes aribozyme located next to a ccdB gene, wherein the a ccdB gene is flankedby attR sites. An LR reaction is used to replace the ccdB gene with anucleic acid molecule which upon expression produces an antisense RNAmolecule. Thus, the expression product will result in the production ofan antisense sequence fused to the ribozyme by an intervening sequenceencoded by an attB site. As discussed below in Example 13, this attBsite can be removed from the transcript (e.g., using intron and exonslice sequences), if desired, or, in certain cases, nucleic acid whichencodes the ribozyme can be embedded in the attB site.

Expression of antisense molecules fused to ribozymes can be used, forexample, to cleave specific RNA molecules in a cell. This is so becausethe antisense RNA portion of the transcript can be designed to hybridizeto particular mRNA molecules. Further, the ribozyme portion of thetranscript can be designed to cleave the RNA molecule to which it hashybridized. For example, the ribozyme can be one which cleavesdouble-stranded RNA (e.g., Tetrahymena L-21 ribozyme).

Example 6 Use of Suppressor tRNAs to Generate Fusion Proteins

The recently developed recombinational cloning techniques describedabove permit the rapid movement of a target nucleic acid from one vectorbackground to one or more other vector backgrounds. Because therecombination event is site specific, the orientation and reading frameof the target nucleic acid can be controlled with respect to the vector.This control makes the construction of fusions between sequences presenton the target nucleic acid and sequences present on the vector a simplematter.

In general terms, a gene may be expressed in four forms: native at bothamino and carboxy termini, modified at either end, or modified at bothends. A construct containing the target gene of interest may include theN-terminal methionine ATG codon, and a stop codon at the carboxy end, ofthe open reading frame, or ORF, thus ATG-ORF-stop. Frequently, the geneconstruct will include translation initiation sequences, tis, that maybe located upstream of the ATG that allow expression of the gene, thustis-ATG-ORF-stop. Constructs of this sort allow expression of a gene asa protein that contains the same amino and carboxy amino acids as in thenative, uncloned, protein. When such a construct is fused in-frame withan amino-terminal protein tag, e.g., GST, the tag will have its own tis,thus tis-ATG-tag-tis-ATG-ORF-stop, and the bases comprising the tis ofthe ORF will be translated into amino acids between the tag and the ORF.In addition, some level of translation initiation may be expected in theinterior of the mRNA (i.e., at the ORF's ATG and not the tag's ATG)resulting in a certain amount of native protein expression contaminatingthe desired protein.

DNA (lower case): tis 1-atg-tag-tis2-atg-orf-stop

RNA (lower case, italics): tis1-atg-tag-tis2-atg-orf-stop

Protein (upper case): ATG-TAG-TIS2-ATG-ORF (tis 1 and stop are nottranslated)+contaminating ATG-ORF (translation of ORF beginning attis2).

Using recombinational cloning, it is a simple matter for those skilledin the art to construct a vector containing a tag adjacent to arecombination site permitting the in frame fusion of a tag to the C-and/or N-terminus of the ORF of interest.

Given the ability to rapidly create a number of clones in a variety ofvectors, there is a need in the art to maximize the number of ways asingle cloned gene can be expressed without the need to manipulate thegene construct itself. The present invention meets this need byproviding materials and methods for the controlled expression of a C-and/or N-terminal fusion to a target gene using one or more suppressortRNAs to suppress the termination of translation at a stop codon. Thus,the present invention provides materials and methods in which a geneconstruct is prepared flanked with recombination sites.

The construct is prepared with a sequence coding for a stop codonpreferably at the C-terminus of the gene encoding the protein ofinterest. In some embodiments, a stop codon can be located adjacent tothe gene, for example, within the recombination site flanking the gene.The target gene construct can be transferred through recombination tovarious vectors which can provide various C-terminal or N-terminal tags(e.g., GFP, GST, His Tag, GUS, etc.) to the gene of interest. When thestop codon is located at the carboxy terminus of the gene, expression ofthe gene with a “native” carboxy end amino acid sequence occurs undernon-suppressing conditions (i.e., when the suppressor tRNA is notexpressed) while expression of the gene as a carboxy fusion proteinoccurs under suppressing conditions. The present invention isexemplified using an amber suppressor supF, which is a particulartyrosine tRNA gene (tyrT) mutated to recognize the UAG stop codon. Thoseskilled in the art will recognize that other suppressors and other stopcodons could be used in the practice of the present invention.

In the present example, the gene coding for the suppressing tRNA hasbeen incorporated into the vector from which the target gene is to beexpressed. In other embodiments, the gene for the suppressor tRNA may bein the genome of the host cell. In still other embodiments, the gene forthe suppressor may be located on a separate vector and provided intrans. In embodiments of this type, the vector containing the suppressorgene may have an origin of replication selected so as to be compatiblewith the vector containing the gene construct. The selection andpreparation of such compatible vectors is within ordinary skill in theart. Those skilled in the art will appreciate that the selection of anappropriate vector for providing the suppressor tRNA in trans mayinclude the selection of an appropriate antibiotic resistance marker.For example, if the vector expressing the target gene contains anantibiotic resistance marker for one antibiotic, a vector used toprovide a suppressor tRNA may encode resistance to a second antibiotic.This permits the selection for host cells containing both vectors.

In some preferred embodiments, more than one copy of a suppressor tRNAmay be provided in all of the embodiments described above. For example,a host cell may be provided that contains multiple copies of a geneencoding the suppressor tRNA. Alternatively, multiple gene copies of thesuppressor tRNA under the same or different promoters may be provided inthe same vector background as the target gene of interest. In someembodiments, multiple copies of a suppressor tRNA may be provided in adifferent vector than the one use to contain the target gene ofinterest. In other embodiments, one or more copies of the suppressortRNA gene may be provided on the vector containing the gene for theprotein of interest and/or on another vector and/or in the genome of thehost cell or in combinations of the above. When more than one copy of asuppressor tRNA gene is provided, the genes may be expressed from thesame or different promoters which may be the same or different as thepromoter used to express the gene encoding the protein of interest.

In some embodiments, two or more different suppressor tRNA genes may beprovided. In embodiments of this type one or more of the individualsuppressors may be provided in multiple copies and the number of copiesof a particular suppressor tRNA gene may be the same or different as thenumber of copies of another suppressor tRNA gene. Each suppressor tRNAgene, independently of any other suppressor tRNA gene, may be providedon the vector used to express the gene of interest and/or on a differentvector and/or in the genome of the host cell. A given tRNA gene may beprovided in more than one place in some embodiments. For example, a copyof the suppressor tRNA may be provided on the vector containing the geneof interest while one or more additional copies may be provided on anadditional vector and/or in the genome of the host cell. When more thanone copy of a suppressor tRNA gene is provided, the genes may beexpressed from the same or different promoters which may be the same ordifferent as the promoter used to express the gene encoding the proteinof interest and may be the same or different as a promoter used toexpress a different tRNA gene.

With reference to FIG. 14, the GUS gene was cloned in frame with a GSTgene separated by the TAG codon. The plasmid also contained a supF geneexpressing a suppressor tRNA. The plasmid was introduced into a hostcell where approximately 60 percent of the GUS gene was expressed as afusion protein containing the GST tag. In control experiments, a plasmidcontaining the same GUS-stop codon-GST construct did not express adetectable amount of a fusion protein when expressed from a vectorlacking the supF gene. In this example, the supF gene was expressed aspart of the mRNA containing the GUS-GST fusion. Since tRNAs aregenerally processed from larger RNA molecules, constructs of this sortcan be used to express the suppressor tRNAs of the present invention. Inother embodiments, the RNA containing the tRNA sequence may be expressedseparately from the mRNA containing the gene of interest.

In some embodiments of the present invention, the target gene ofinterest and the gene expressing the suppressor tRNA may be controlledby the same promoter. In other embodiments, the target gene of interestmay be expressed from a different promoter than the suppressor tRNA.Those skilled in the art will appreciate that, under certaincircumstances, it may be desirable to control the expression of thesuppressor tRNA and/or the target gene of interest using a regulatablepromoter. For example, either the target gene of interest and/or thegene expressing the suppressor tRNA may be controlled by a promoter suchas the lac promoter or derivatives thereof such as the tac promoter. Inthe embodiment shown, both the target gene of interest and thesuppressor tRNA gene are expressed from the T7 RNA polymerase promoter.Induction of the T7 RNA polymerase turns on expression of both the geneof interest (GUS in this case) and the supF gene expressing thesuppressor tRNA as part of one RNA molecule.

In some preferred embodiments, the expression of the suppressor tRNAgene may be under the control of a different promoter from that of thegene of interest. In some embodiments, it may be possible to express thesuppressor gene before the expression of the target gene. This wouldallow levels of suppressor to build up to a high level, before they areneeded to allow expression of a fusion protein by suppression of a thestop codon. For example, in embodiments of the invention where thesuppressor gene is controlled by a promoter inducible with IPTG, thetarget gene is controlled by the T7 RNA polymerase promoter and theexpression of the T7 RNA polymerase is controlled by a promoterinducible with an inducing signal other than IPTG, e.g., NaCl, one couldturn on expression of the suppressor tRNA gene with IPTG prior to theinduction of the T7 RNA polymerase gene and subsequent expression of thegene of interest. In some preferred embodiments, the expression of thesuppressor tRNA might be induced about 15 minutes to about one hourbefore the induction of the T7 RNA polymerase gene. In a preferredembodiment, the expression of the suppressor tRNA may be induced fromabout 15 minutes to about 30 minutes before induction of the T7 RNApolymerase gene. In the specific example shown, the expression of the T7RNA polymerase gene is under the control of a salt inducible promoter. Acell line having an inducible copy of the T7 RNA polymerase gene underthe control of a salt inducible promoter is commercially available fromInvitrogen Corp. (Carlsbad, Calif.) under the designation of the BL21 SIstrain.

In some preferred embodiments, the expression of the target gene ofinterest and the suppressor tRNA can be arranged in the form of afeedback loop. For example, the target gene of interest may be placedunder the control of the T7 RNA polymerase promoter while the suppressorgene is under the control of both the T7 promoter and the lac promoter,and the T7 RNA polymerase gene itself is transcribed by both the T7promoter and the lac promoter, and the T7 RNA polymerase gene has anamber stop mutation replacing a normal tyrosine stop codon, e.g., the28^(th) codon (out of 883). No active T7 RNA polymerase can be madebefore levels of suppressor are high enough to give significantsuppression. Then expression of the polymerase rapidly rises, becausethe T7 polymerase expresses the suppressor gene as well as itself. Inother preferred embodiments, only the suppressor gene is expressed fromthe T7 RNA polymerase promoter. Embodiments of this type would give ahigh level of suppressor without producing an excess amount of T7 RNApolymerase. In other preferred embodiments, the T7 RNA polymerase genehas more than one amber stop mutation (see, e.g., FIG. 14B). This willrequire higher levels of suppressor before active T7 RNA polymerase isproduced.

In some embodiments of the present invention it may be desirable to havemore than one stop codon suppressible by more than one suppressor tRNA.With reference to FIG. 15, a vector may be constructed so as to permitthe regulatable expression of N- and/or C-terminal fusions of a proteinof interest from the same construct. A first tag sequence, TAG1 in FIG.15, is expressed from a promoter represented by an arrow in the figure.The tag sequence includes a stop codon in the same reading frame as thetag. The stop codon 1, may be located anywhere in the tag sequence andis preferably located at or near the C-terminal of the tag sequence. Thestop codon may also be located in the recombination site RS, or in theinternal ribosome entry sequence (IRES). The construct also includes agene of interest (GENE) which includes a stop codon 2. The first tag andthe gene of interest are preferably in the same reading frame althoughinclusion of a sequence that causes frame shifting to bring the firsttag into the same reading frame as the gene of interest is within thescope of the present invention. Stop codon 2 is in the same readingframe as the gene of interest and is preferably located at or near theend of the coding sequence for the gene. Stop codon 2 may optionally belocated within the recombination site RS₂. The construct also includes asecond tag sequence in the same reading frame as the gene of interestindicated by TAG2 in FIG. 15 and the second tag sequence may optionallyinclude a stop codon 3 in the same reading frame as the second tag. Atranscription terminator may be included in the construct after thecoding sequence of the second tag (not shown in FIG. 15). Stop codons 1,2 and 3 may be the same or different. In some embodiments, stop codons1, 2 and 3 are different. In embodiments where 1 and 2 are different,the same construct may be used to express an N-terminal fusion, aC-terminal fusion and the native protein by varying the expression ofthe appropriate suppressor tRNA. For example, to express the nativeprotein, no suppressor tRNAs are expressed and protein translation iscontrolled by the IRES. When an N-terminal fusion is desired, asuppressor tRNA that suppresses stop codon 1 is expressed while asuppressor tRNA that suppresses stop codon 2 is expressed in order toproduce a C-terminal fusion. In some instances it may be desirable toexpress a doubly tagged protein of interest in which case suppressortRNAs that suppress both stop codon 1 and stop codon 2 may be expressed.

The present invention has been described in some detail by way ofillustration and example for purposes of clarity of understanding, itwill be obvious to one of ordinary skill in the art that the same can beperformed by modifying or changing the invention within a wide andequivalent range of conditions, formulations and other parameterswithout affecting the scope of the invention or any specific embodimentthereof, and that such modifications or changes are intended to beencompassed within the scope of the appended claims.

Example 7 Testing Functionality of Entry and Destination Vectors

As part of assessment of the functionality of particular vectors of theinvention, it is important to functionally test the ability of thevectors to recombine. This assessment can be carried out by performing arecombinational cloning reaction by transforming E. coli and scoringcolony forming units. However, an alternative assay may also beperformed to allow faster, more simple assessment of the functionalityof a given Entry or Destination Vector by agarose gel electrophoresis.The following is a description of such an in vitro assay.

Materials and Methods:

Plasmid templates pEZC1301 and pEZC1313 (described in PCT Publication WO00/52027, the entire disclosure of which is incorporated herein byreference), each containing a single wild-type att site, were used forthe generation of PCR products containing attL or attR sites,respectively. Plasmid templates were linearized with AlwNI, phenolextracted, ethanol precipitated and dissolved in TE to a concentrationof 1 ng/μl.

PCR Primers (Capital Letters Represent Base Changes from Wild-Type):

attL1 (SEQ ID NO: 41) gggg agcct gcttttttGtacAaa gttggcatta taaaaaagcattgc attL2 (SEQ ID NO: 42)gggg agcct gctttCttGtacAaa gttggcatta taaaaaagca ttgc attL right(SEQ ID NO: 43) tgttgccggg aagctagagt aa attR1 (SEQ ID NO: 44)gggg Acaag ttTgtaCaaaaaagc tgaacgaga aacgtaaaat attR2 (SEQ ID NO: 45)gggg Acaag ttTgtaCaaGaaagc tgaacgaga aacgtaaaat attR right(SEQ ID NO: 46) ca gacggcatga tgaacctgaa

PCR primers were dissolved in TE to a concentration of 500 pmol/μl.Primer mixes were prepared, consisting of attL1+attLright primers,attL2+attLright primers, attR1+attRright primers, and attR2+attRrightprimers, each mix containing 20 pmol/μl of each primer.

PCR Reactions:

1 μl plasmid template (1 ng)

1 μl primer pairs (20 pmoles of each)

3 μl of H₂O

45 μl of Platinum PCR SuperMix® (Invitrogen Corp., Carlsbad, Calif.)

Cycling Conditions (Performed in MJ Thermocycler):

95° C./2 minutes

94° C./30 seconds

25 cycles of 58° C./30 seconds and 72° C./1.5 minutes

72° C./5 minutes

5° C./hold

The resulting attL PCR product was 1.5 kb, and the resulting attR PCRproduct was 1.0 kb.

PCR reactions were PEG/MgCl₂ precipitated by adding 150 H₂o and 100 μlof 3×PEG/MgCl₂ solution followed by centrifugation. The PCR productswere dissolved in 50 μl of TE. Quantification of the PCR product wasperformed by gel electrophoresis of 1 μl and was estimated to be 50-100ng/μl.

Recombination reactions of PCR products containing attL or attR siteswith GATEWAY™ plasmids was performed as follows:

8 μl of H₂O

2 μl of attL or attR PCR product (100-200 ng)

2 μl of GATEWAY™ plasmid (100 ng)

4 μl of 5× Destination buffer

4 μl of GATEWAY™ LR Clonase™ Enzyme Mix

20 μl total volume (the reactions can be scaled down to a 5 μl totalvolume by adjusting the volumes of the components to about ¼ of thoseshown above, while keeping the stoichiometries the same).

Clonase reactions were incubated at 25° C. for 2 hours. Two μl ofproteinase K (2 mg/ml) was added to stop the reaction. Ten μl was thenrun on a 1% agarose gel. Positive control reactions were performed byreacting attL1 PCR product (1.0 kb) with attR1PCR product (1.5 kb) andby similarly reacting attL2 PCR product with attR2PCR product to observethe formation of a larger (2.5 kb) recombination product. Negativecontrols were similarly performed by reacting attL1 PCR product withattR2PCR product and vice versa or reactions of attL PCR product with anattL plasmid, etc.

In alternative assays, to test attB Entry vectors, plasmids containingsingle attP sites were used. Plasmids containing single att sites couldalso be used as recombination substrates in general to test all Entryand Destination vectors (i.e., those containing attL, attR, attB andattP sites). This would eliminate the need to do PCR reactions.

Results:

Destination and Entry plasmids when reacted with appropriateatt-containing PCR products formed linear recombinant molecules thatcould be easily visualized on an agarose gel when compared to controlreactions containing no attL or attR PCR product. Thus, thefunctionality of Destination and Entry vectors constructed according tothe invention may be determined, for example, by carrying out thelinearization assay described above.

Example 8

PCR Cloning Using Universal Adapter-Primers

As described herein, the cloning of PCR products using the GATEWAY™ PCRCloning System (Invitrogen Corp., Carlsbad, Calif.) requires theaddition of attB sites (attB1 and attB2) to the ends of gene-specificprimers used in the PCR reaction. Available data suggested that the useradd 29 bp (25 bp containing the attB site plus four G residues) to thegene-specific primer. It would be advantageous to high volume users ofthe GATEWAY™ PCR Cloning System to generate attB-containing PCR productusing universal attB adapter-primers in combination with shortergene-specific primers containing a specified overlap to the adapters.The following experiments demonstrate the utility of this strategy usinguniversal attB adapter-primers and gene-specific primers containingoverlaps of various lengths from 6 bp to 18 bp. The results demonstratethat gene-specific primers with overlaps of 10 bp to 18 bp can be usedsuccessfully in PCR amplifications with universal attB adapter-primersto generate full-length PCR products. These PCR products can then besuccessfully cloned with high fidelity in a specified orientation usingthe GATEWAY™ PCR Cloning System.

Methods and Results:

To demonstrate that universal attB adapter-primers can be used withgene-specific primers containing partial attB sites in PCR reactions togenerate full-length PCR product, a small 256 bp region of the humanhemoglobin cDNA was chosen as a target so that intermediate sizedproducts could be distinguished from full-length products by agarose gelelectrophoresis.

The following oligonucleotides were used:

B1-Hgb: (SEQ ID NO: 47) GGGG ACA AGT TTG TAC AAA AAA GCA GGC T-5′-Hgb*B2-Hgb: (SEQ ID NO: 48) GGGG ACC ACT TTG TAC AAG AAA GCT GGG  T-3′-Hgb**18B1-Hgb: (SEQ ID NO: 49) TG TAC AAA AAA GCA GGC T-5′-Hgb 18B2-Hgb:(SEQ ID NO: 50) TG TAC AAG AAA GCT GGG T-3′-Hgb 15B1-Hgb:(SEQ ID NO: 51) AC AAA AAA GCA GGC T-5′-Hgb 15B2-Hgb: (SEQ ID NO: 52)AC AAG AAA GCT GGG T-3′-Hgb 12B1-Hgb: (SEQ ID NO: 53)AA AAA GCA GGC T-5′-Hgb 12B2-Hgb: (SEQ ID NO: 54)AG AAA GCT GGG T-3′-Hgb 11B1-Hgb: (SEQ ID NO: 55) A AAA GCA GGC T-5′-Hgb11B2-Hgb: (SEQ ID NO: 56) G AAA GCT GGG T-3′-Hgb 10B1-Hgb:(SEQ ID NO: 57) AAA GCA GGC T-5′-Hgb 10B2-Hgb: (SEQ ID NO: 58)AAA GCT GGG T-3′-Hgb 9B1-Hgb: AA GCA GGC T-5′-Hgb 9B2-Hgb:AA GCT GGG T-3′-Hgb 8B1-Hgb: A GCA GGC T-5′-Hgb 8B2-Hgb:A GCT GGG T-3′-Hgb 7B1-Hgb: GCA GGC T-5′-Hgb 7B2-Hgb: GCT GGG T-3′-Hgb6B1-Hgb: CA GGC T-5′-Hgb 6B2-Hgb: CT GGG T-3′-Hgb attB1 adapter:(SEQ ID NO: 47) GGGG ACA AGT TTG TAC AAA AAA GCA GGC T attB2 adapter:(SEQ ID NO: 48) GGGG ACC ACT TTG TAC AAG AAA GCT GGG T (SEQ ID NO: 59)*-5′-Hgb = GTC ACT AGC CTG TGG AGC AAG A (SEQ ID NO: 60) **-3′-Hgb =AGG ATG GCA GAG GGA GAC GAC A

The aim of these experiments was to develop a simple and efficientuniversal adapter PCR method to generate attB containing PCR productssuitable for use in the GATEWAY™ PCR Cloning System. The reactionmixtures and thermocycling conditions should be simple and efficient sothat the universal adapter PCR method could be routinely applicable toany PCR product cloning application.

PCR reaction conditions were initially found that could successfullyamplify predominately full-length PCR product using gene-specificprimers containing 18 bp and 15 bp overlap with universal attB primers.These conditions are outlined below:

10 pmoles of gene-specific primers

10 pmoles of universal attB adapter-primers

1 ng of plasmid containing the human hemoglobin cDNA.

100 ng of human leukocyte cDNA library DNA.

5 μl of 10× PLATINUM Taq HiFi® reaction buffer (Invitrogen Corp.,Carlsbad, Calif.)

2 μl of 50 mM MgSO₄

1 μl of 10 mM dNTPs

0.2 μl of PLATINUM Taq HiFi® (1.0 unit)

H₂O to 50 μl total reaction volume

Cycling Conditions:

95° C./5 min

[94° C./15 sec

25×[50° C./30 sec

[68° C./1 min

68° C./5 min

5° C./hold

To assess the efficiency of the method, 2 μl ( 1/25) of the 50 μl PCRreaction was electrophoresed in a 3% Agarose-1000 gel. With overlaps of12 bp or less, smaller intermediate products containing one or nouniversal attB adapter predominated the reactions. Further optimizationof PCR reaction conditions was obtained by titrating the amounts ofgene-specific primers and universal attb adapter-primers. The PCRreactions were set up as outlined above except that the amounts ofprimers added were:

0, 1, 3 or 10 pmoles of gene-specific primers

0, 10, 30 or 100 pmoles of adapter-primers

Cycling Conditions:

95° C./3 min

[94° C./15 sec

25×[48° C./45 sec

[68° C./1 min

68° C./5 min

5° C./hold

The use of limiting amounts of gene-specific primers (3 pmoles) andexcess adapter-primers (30 pmoles) reduced the amounts of smallerintermediate products. Using these reaction conditions the overlapnecessary to obtain predominately full-length PCR product was reduced to12 bp. The amounts of gene-specific and adapter-primers was furtheroptimized in the following PCR reactions:

0, 1, 2 or 3 pmoles of gene-specific primers

0, 30, 40 or 50 pmoles of adapter-primers

Cycling Conditions:

95° C./3 min

[94° C./15 sec

25×[48° C./1 min

[68° C./1 min

68° C./5 min

5° C./hold

The use of 2 pmoles of gene-specific primers and 40 pmoles ofadapter-primers further reduced the amounts of intermediate products andgenerated predominately full-length PCR products with gene-specificprimers containing an 11 bp overlap. The success of the PCR reactionscan be assessed in any PCR application by performing a no adaptercontrol. The use of limiting amounts of gene-specific primers shouldgive faint or barely visible bands when 1/25 to 1/10 of the PCR reactionis electrophoresed on a standard agarose gel. Addition of the universalattB adapter-primers should generate a robust PCR reaction with a muchhigher overall yield of product.

PCR products from reactions using the 18 bp, 15 bp, 12 bp, 11 bp and 10bp overlap gene-specific primers were purified using the CONCERTS RapidPCR Purification System (PCR products greater than 500 bp can be PEGprecipitated). The purified PCR products were subsequently cloned intoan attP containing plasmid vector using the GATEWAY™ PCR Cloning System(Invitrogen Corp., Carlsbad, Calif.) and transformed into E. coli.Colonies were selected and counted on the appropriate antibiotic mediaand screened by PCR for correct inserts and orientation.

Raw PCR products (unpurified) from the attB adapter PCR of a plasmidclone of part of the human beta-globin (Hgb) gene were also used inGATEWAY™ PCR Cloning System reactions. PCR products generated with thefull attB B1/B2-Hgb, the 12B1/B2, 11B1/B2 and 101B/B2 attB overlap Hgbprimers were successfully cloned into the GATEWAY™ pENTR21 attP vector(described in PCT Publication WO 00/52027, the entire disclosure ofwhich is incorporated herein by reference). 24 colonies from each(24×4=96 total) were tested and each was verified by PCR to containcorrect inserts. The cloning efficiency expressed as cfu/ml is shownbelow:

Primer Used cfu/ml Hgb full attB 8,700 Hgb 12 bp overlap 21,000 Hgb 11bp overlap 20,500 Hgb 10 bp overlap 13,500 GFP control 1,300

Interestingly, the overlap PCR products cloned with higher efficiencythan did the full attB PCR product. Presumably, and as verified byvisualization on agarose gel, the adapter PCR products were slightlycleaner than was the full attB PCR product. The differences in colonyoutput may also reflect the proportion of PCR product molecules withintact attB sites.

Using the attB adapter PCR method, PCR primers with 12 bp attB overlapswere used to amplify cDNAs of different sizes (ranging from 1 to 4 kb)from a leukocyte cDNA library and from first strand cDNA prepared fromHeLa total RNA. While three of the four cDNAs were able to be amplifiedby this method, a non-specific amplification product was also observedthat under some conditions would interfere with the gene-specificamplification. This non-specific product was amplified in reactionscontaining the attB adapter-primers alone without any gene-specificoverlap primers present. The non-specific amplification product wasreduced by increasing the stringency of the PCR reaction and loweringthe attB adapter PCR primer concentration.

These results indicate that the adapter-primer PCR approach described inthis Example will work well for cloned genes. These results alsodemonstrate the development of a simple and efficient method to amplifyPCR products that are compatible with the GATEWAY™ PCR Cloning Systemthat allows the use of shorter gene-specific primers that partiallyoverlap universal attB adapter-primers. In routine PCR cloningapplications, the use of 12 bp overlaps is recommended. The methodsdescribed in this Example can thus reduce the length of gene-specificprimers by up to 17 residues or more, resulting in a significant savingsin oligonucleotide costs for high volume users of the GATEWAY™ PCRCloning System. In addition, using the methods and assays described inthis Example, one of ordinary skill can, using only routineexperimentation, design and use analogous primer-adapters based on orcontaining other recombination sites or fragments thereof, such as attL,attR, attP, lox, FRT, etc.

As an alternative to adding 29 bases to the ends of PCR primers, attBPCR products can be generated with primers containing as few as 12 basesof attB added to template-specific primers using a two-step PCRprotocol. In the first step template-specific primers containing 12bases of attB are used in 10 cycles of PCR to amplify the target gene. Aportion of this PCR reaction is transferred to a second PCR reactioncontaining universal attB adapter primers to amplify the full-attB PCRproduct.

Template-specific primers with 12 bases of attB1 and attB2 at their5′-ends are designed as shown below:

12 attB1: AA AAA GCA GGC TNN (SEQ ID NO: 139)-forward template-specific primer 12 attB2: A GAA AGC TGG GTN(SEQ ID NO: 140)- reverse template-specific primer

The template-specific part of the primers is generally be designed tohave a Tm of greater than 50° C. The optimal annealing temperature isdetermined by the Tm of the template-specific part of the primer.

attB1 adapter primer: GGGGACAAGTTTGTACAAAAAAGCAGGCT (SEQ ID NO: 47)attB2 adapter primer: GGGGACCACTTTGTACAAGAAAGCTGGGT (SEQ ID NO: 48)

A 50 μl PCR reaction containing 10 pmoles of each template-specificprimer and the appropriate amount of template DNA is prepared. Tubescontaining this PCR reaction mixture are placed in a thermal cycler at95° C. and incubated for 2 minutes.

Ten cycles of PCR are performed as follows:

Denature 94° C. for 15 seconds

Anneal 50-60° C. for 30 seconds

Extend 68° C. for 1 minute/kb of target amplicon

Ten μl of the PCR reaction product is transferred to a 40 μl PCRreaction mixture containing 40 pmoles each of the attB 1 and attB2adapter primers. Tubes containing this mixtures are then placed in athermal cycler at 95° C. and incubated for 1 minute.

Five cycles of PCR are performed as follows:

Denature 94° C. for 15 seconds

Anneal 45° C. for 30 seconds

Extend 68° C. for 1 minute/kb of target amplicon

Fifteen to twenty cycles of PCR are then performed as follows:

Denature 94° C. for 15 seconds

Anneal 55° C. for 30 seconds

Extend 68° C. for 1 minute/kb of target amplicon

The amplification products are then analyzed by agarose gelelectrophoresis.

Example 9

Mutational Analysis of the Bacteriophage Lambda attL and attR Sites:Determinants of att Site Specificity in Site-Specific Recombination

To investigate the determinants of att site specificity, thebacteriophage lambda attL and attR sites were systematically mutagenizedand examined to define precisely which mutations produce unique changesin att site specificity. As noted herein, the determinants ofspecificity have previously been localized to the 7 bp overlap region(TTTATAC, which is defined by the cut sites for the integrase proteinand is the region where strand exchange takes place) within the 15 bpcore region (GCTTTTTTATACTAA (SEQ ID NO:37)) that is identical in allfour lambda att sites, attB, attP, attL and attR.

Therefore, to examine the effect of att sequence on site specificity,mutant attL and attR sites were generated by PCR and tested in an invitro site-specific recombination assay. In this way all possible singlebase pair changes within the 7 bp overlap region of the core att sitewere generated as well as five additional changes outside the 7 bpoverlap but within the 15 bp core att site. Each attL PCR substrate wastested in the in vitro recombination assay with each of the attR PCRsubstrates.

Methods

To examine both the efficiency and specificity of recombination ofmutant attL and attR sites, a simple in vitro site-specificrecombination assay was developed. Since the core regions of attL andattR lie near the ends of these sites, it was possible to incorporatethe desired nucleotide base changes within PCR primers and generate aseries of PCR products containing mutant attL and attR sites. PCRproducts containing attL and attR sites were used as substrates in an invitro reaction with GATEWAY™ LR CLONASE™ Enzyme Mix (Invitrogen Corp.,Carlsbad, Calif.). Recombination between a 1.5 kb attL PCR product and a1.0 kb attR PCR product resulted in a 2.5 kb recombinant molecule thatwas monitored using agarose gel electrophoresis and ethidium bromidestaining.

Plasmid templates pEZC1301 and pEZC1313 (described in PCT Publication WO00/52027, the entire disclosure of which is incorporated herein byreference), each containing a single wild-type attL or attR site,respectively, were used for the generation of recombination substrates.The following list shows primers used in PCR reactions to generate theattL PCR products that were used as substrates in LR CLONASE™ reactions(capital letters represent changes from the wild-type sequence, and theunderline represents the 7 bp overlap region within the 15 bp core attsite; a similar set of PCR primers was used to prepare the attR PCRproducts containing matching mutations):

GATEWAY™ sites (note: attL2 sequence in GATEWAY™ plasmids begins “accca”while the attL2 site in this example begins “agcct” to reflect wild-typeattL outside the core region.):

attL1: (SEQ ID NO: 41) gggg agcct gcttttttGtacAaa gttggcatta taaaaa-agca ttgc attL2: (SEQ ID NO: 42)gggg agcct gctttCttGtacAaa gttggcatta taaaaa- agca ttgc

Wild-Type:

attL0: (SEQ ID NO: 61) gggg agcct gcttttttatactaa gttggcatta taaaaa-agca ttgc

Single Base Changes from Wild-Type:

attLT1A: (SEQ ID NO: 62) gggg agcct gcttt Attatactaa gttggcatta-taaaaaagca ttgc attLT1C: (SEQ ID NO: 63)gggg agcct gcttt Cttatactaa gttggcatta- taaaaaagca ttgc attLT1G:(SEQ ID NO: 64) gggg agcct gcttt Gttatactaa gttggcatta- taaaaaagca ttgcattLT2A: (SEQ ID NO: 65) gggg agcct gcttt tAtatactaa gttggcatta-taaaaaagca ttgc attLT2C: (SEQ ID NO: 66)gggg agcct gcttt tCtatactaa gttggcatta- taaaaaagca ttgc attLT2G:(SEQ ID NO: 67) gggg agcct gcttt tGtatactaa gttggcatta- taaaaaagca ttgcattLT3A: (SEQ ID NO: 68) gggg agcct gcttt ttAatactaa gttggcatta-taaaaaagca ttgc attLT3C: (SEQ ID NO: 69)gggg agcct gcttt ttCatactaa gttggcatta- taaaaaagca ttgc attLT3G:(SEQ ID NO: 70) gggg agcct gcttt ttGatactaa gttggcatta- taaaaaagca ttgcattLA4C: (SEQ ID NO: 71) gggg agcct gcttt tttCtactaa gttggcatta-taaaaaagca ttgc attLA4G: (SEQ ID NO: 72)gggg agcct gcttt tttGtactaa gttggcatta- taaaaaagca ttgc attLA4T:(SEQ ID NO: 73) gggg agcct gcttt tttTtactaa gttggcatta- taaaaaagca ttgcattLT5A: (SEQ ID NO: 74) gggg agcct gcttt tttaAactaa gttggcatta-taaaaaagca ttgc attLT5C: (SEQ ID NO: 75)gggg agcct gcttt tttaCactaa gttggcatta- taaaaaagca ttgc attLT5G:(SEQ ID NO: 76) gggg agcct gcttt tttaGactaa gttggcatta- taaaaaagca ttgcattLA6C: (SEQ ID NO: 77) gggg agcct gcttt tttatCctaa gttggcatta-taaaaaagca ttgc attLA6G: (SEQ ID NO: 78)gggg agcct gcttt tttatGctaa gttggcatta- taaaaaagca ttgc attLA6T:(SEQ ID NO: 79) gggg agcct gcttt tttatTctaa gttggcatta- taaaaaagca ttgcattLC7A: (SEQ ID NO: 80) gggg agcct gcttt tttataAtaa gttggcatta-taaaaaagca ttgc attLC7G: (SEQ ID NO: 81)gggg agcct gcttt tttataGtaa gttggcatta- taaaaaagca ttgc attLC7T:(SEQ ID NO: 82) gggg agcct gcttt tttataTtaa gttggcatta- taaaaaagca ttgc

Single Base Changes Outside of the 7 bp Overlap:

attL8: (SEQ ID NO: 83) gggg agcct Acttt tttatactaa gttggcatta-taaaaaagca ttgc attL9: (SEQ ID NO: 84)gggg agcct gcCtt tttatactaa gttggcatta- taaaaaagca ttgc attL10:(SEQ ID NO: 85) gggg agcct gcttC tttatactaa gttggcatta- taaaaaagca ttgcattL14: (SEQ ID NO: 86) gggg agcct gcttt tttatacCaa gttggcatta-taaaaaagca ttgc attL15: (SEQ ID NO: 87)gggg agcct gcttt tttatactaG gttggcatta- taaaaaagca ttgc

Note: Additional vectors wherein the first nine bases are gggg agcca(i.e., substituting an adenine for the thymine in the positionimmediately preceding the 15-bp core region), which may or may notcontain the single base pair substitutions (or deletions) outlinedabove, can also be used in these experiments.

Recombination reactions of attL- and attR-containing PCR products wasperformed as follows:

8 μl of H₂O

2 μl of attL PCR product (100 ng)

2 μl of attR PCR product (100 ng)

4 μl of 5× buffer

4 μl of GATEWAY™ LR CLONASE™ Enzyme Mix

20 μl total volume

CLONASE™ reactions were incubated at 25° C. for 2 hours.

2 μl of 10× CLONASE™ stop solution (proteinase K, 2 mg/ml) were added tostop the reaction.

10 μl of the reaction mixtures were run on a 1% agarose gel.

Results

Each attL PCR substrate was tested in the in vitro recombination assaywith each of the attR PCR substrates. The results indicate that changeswithin the first three positions of the 7 bp overlap (TTTATAC) stronglyaltered the specificity of recombination. These mutant att sites eachrecombined as well as the wild-type, but only with their cognate partnermutant; they did not recombine detectably with any other att sitemutant. In contrast, changes in the last four positions (TTTATAC) onlypartially altered specificity; these mutants recombined with theircognate mutant as well as wild-type att sites and recombined partiallywith all other mutant att sites except for those having mutations in thefirst three positions of the 7 bp overlap. Changes outside of the 7 bpoverlap were found not to affect specificity of recombination, but somedid influence the efficiency of recombination.

Based on these results, the following rules for att site specificitywere determined:

Only changes within the 7 bp overlap affect specificity.

Changes within the first 3 positions strongly affect specificity.

Changes within the last 4 positions weakly affect specificity.

Mutations that affected the overall efficiency of the recombinationreaction were also assessed by this method. In these experiments, aslightly increased (less than 2-fold) recombination efficiency withattLT1A and attLC7T substrates was observed when these substrates werereacted with their cognate attR partners. Also observed were mutationsthat decreased recombination efficiency (approximately 2-3 fold),including attLA6G, attL14 and attL15. These mutations presumably reflectchanges that affect Int protein binding at the core att site.

The results of these experiments demonstrate that changes within thefirst three positions of the 7 bp overlap (TTTATAC) strongly altered thespecificity of recombination (i.e., att sequences with one or moremutations in the first three thymidines would only recombine with theircognate partners and would not cross-react with any other att sitemutation). In contrast, mutations in the last four positions (TTTATAC)only partially altered specificity (i.e., att sequences with one or moremutations in the last four base positions would cross-react partiallywith the wild-type att site and all other mutant att sites, except forthose having mutations in one or more of the first three positions ofthe 7 bp overlap). Mutations outside of the 7 bp overlap were not foundto affect specificity of recombination, but some were found to influence(i.e., to cause a decrease in) the efficiency of recombination.

Example 10

Discovery of Au Site Mutations That Increase the Cloning Efficiency ofGATEWAY™ Cloning Reactions

In experiments designed to understand the determinants of au sitespecificity, point mutations in the core region of attL were made.Nucleic acid molecules containing these mutated attL sequences were thenreacted in an LR reaction with nucleic acid molecules containing thecognate attR site (i.e., an attR site containing a mutationcorresponding to that in the attL site), and recombinational efficiencywas determined as described above. Several mutations located in the coreregion of the au site were noted that either slightly increased (lessthan 2-fold) or decreased (between 2-4-fold) the efficiency of therecombination reaction (Table 5).

TABLE 5 Effects of attL mutations on Recombination Reactions. SiteSequence SEQ ID Effect on Recombination attL0agcctgcttttttatactaagttggcatta 88 slightly increased attL5agcctgctttAttatactaagttggcatta 89 slightly increased attL6agcctgcttttttataTtaagttggcatta 90 decreased attL13agcctgcttttttatGctaagttggcatta 91 decreased attL14agcctgcttttttatacCaagttggcatta 92 decreased attL15agcctgcttttttatactaGgttggcatta 93 decreased consensusCAACTTnnTnnnAnnAAGTTG 94 N/A

It was also noted that these mutations presumably reflected changes thateither increased or decreased, respectively, the relative affinity ofthe integrase protein for binding the core att site. A consensussequence for an integrase core-binding site (CAACTTNNT) has beeninferred in the literature but not directly tested (see, e.g., Ross andLandy, Cell 33:261-272 (1983)). This consensus core integrase-bindingsequence was established by comparing the sequences of each of the fourcore att sites found in attP and attB as well as the sequences of fivenon-att sites that resemble the core sequence and to which integrase hasbeen shown to bind in vitro. These experiments suggest that many more ausite mutations might be identified which increase the binding ofintegrase to the core au site and thus increase the efficiency ofGATEWAY™ cloning reactions.

Example 11 Effects of Core Region Mutations on Recombination Efficiency

To directly compare the cloning efficiency of mutations in the att sitecore region, single base changes were made in the attB2 site of anattB1-tet-attB2 PCR product. Nucleic acid molecules containing thesemutated attB2 sequences were then reacted in a BP reaction with nucleicacid molecules containing non-cognate attP sites (i.e., wild-typeattP2), and recombinational efficiency was determined as describedabove. The cloning efficiency of these mutant attB2 containing PCRproducts compared to standard attB1-tet-attB2 PCR product are shown inTable 6.

TABLE 6 Efficiency of Recombination With Mutated attB2 Sites. SEQ IDCloning Site Sequence NO. Mutation Efficiency attB0tcaagttagtataaaaaagcaggct 95 attB1 ggggacaagtttgtacaaaaaagcaggct 47attB2 ggggaccactttgtacaagaaagctgggt 48 100.00% attB2.1ggggaAcactttgtacaagaaagctgggt 96 C→A     40% attB2.2ggggacAactttgtacaagaaagctgggt 97 C→A    131% attB2.3ggggaccCctttgtacaagaaagctgggt 98 A→C      4% attB2.4ggggaccaAtttgtacaagaaagctgggt 99 C→A     11% attB2.5ggggaccacGttgtacaagaaagctgggt 100 T→G      4% attB2.6ggggaccactGtgtacaagaaagctgggt 101 T→G      6% attB2.7ggggaccacttGgtacaagaaagctgggt 102 T→G      1% attB2.8ggggaccactttTtacaagaaagctgggt 103 G→T    0.5%

As noted above, a single base change in the attB2.2 site increased thecloning efficiency of the attB1-tet-attB2.2 PCR product to 131% comparedto the attB1-tet-attB2 PCR product. Interestingly, this mutation changesthe integrase core binding site of attB2 to a sequence that matches moreclosely the proposed consensus sequence.

Additional experiments were performed to directly compare the cloningefficiency of an attB1-tet-attB2 PCR product with a PCR product thatcontained attB sites containing the proposed consensus sequence of anintegrase core binding site. The following attB sites were used toamplify attB-tet PCR products:

attB1 (SEQ ID NO: 47) ggggacaagtttgtacaaaaaagcaggct attB1.6(SEQ ID NO: 104) ggggacaaCtttgtacaaaaaagTTggct attB2 (SEQ ID NO: 48)ggggaccactttgtacaagaaagctgggt attB2.10 (SEQ ID NO: 105)ggggacAactttgtacaagaaagTtgggt

BP reactions were carried out between 300 ng (100 fmoles) of pDONR201(Invitrogen Corp., Carlsbad, Calif., Cat. No. 11798-014) with 80 ng (80fmoles) of attB-tet PCR product in a 20 μl volume with incubation for1.5 hours at 25° C., creating pENTR201-tet Entry clones. A comparison ofthe cloning efficiencies of the above-noted attB sites in BP reactionsis shown in Table 7.

TABLE 7 Cloning efficiency of BP Reactions. PCR Product CFU/ml FoldIncrease B1-tet-B2 7500 B1.6-tet-B2 12000 1.6× B1-tet-B2.10 20900 2.8×B1.6-tet-B2.10 30100 4.0×

These results demonstrate that attB PCR products containing sequencesthat perfectly match the proposed consensus sequence for integrase corebinding sites can produce Entry clones with four-fold higher efficiencythan standard GATEWAY™ attB1 and attB2 PCR products.

The entry clones produced above were then transferred to pDEST20(Invitrogen Corp., Carlsbad, Calif., Cat. No. 11807-013) via LRreactions (300 ng (64 fmoles) pDEST20 mixed with 50 ng (77 fmoles) ofthe respective pENTR201-tet Entry clone in 20 μl volume; incubated for a1 hour incubation at 25° C.). The efficiencies of cloning for thesereactions are compared in Table 8.

TABLE 8 Cloning Efficiency of LR Reactions. pENTR201-tet × pDEST20CFU/ml Fold Increase L1-tet-L2 5,800 L1.6-tet-L2 8,000 1.4 L1-tet-L2.1010,000 1.7 L1.6-tet-L2.10 9,300 1.6

These results demonstrate that the mutations introduced into attB1.6 andattB2.10 that transfer with the gene into entry clones slightly increasethe efficiency of LR reactions. Thus, the present invention encompassesnot only mutations in attB sites that increase recombination efficiency,but also to the corresponding mutations that result in the attL sitescreated by the BP reaction.

To examine the increased cloning efficiency of the attB1.6-tet-attB2.10PCR product over a range of PCR product amounts, experiments analogousto those described above were performed in which the amount of attB PCRproduct was titrated into the reaction mixture. The results are shown inTable 9.

TABLE 9 Titration of attB PCR products. Amount of attB Fold PCR product(ng) PCR product CFU/ml Increase 20 attB1-tet-attB2 3,500 6.1attB1.6-tet-attB2.10 21,500 50 attB1-tet-attB2 9,800 5.0attB1.6-tet-attB2.10 49,000 100 attB1-tet-attB2 18,800 2.8attB1.6-tet-attB2.10 53,000 200 attB1-tet-attB2 19,000 2.5attB1.6-tet-attB2.10 48,000

These results demonstrate that as much as a six-fold increase in cloningefficiency is achieved with the attB1.6-tet-attB2.10 PCR product ascompared to the standard attB1-tet-attB2 PCR product at the 20 ngamount.

Example 12 Determination of attB Sequence Requirements for OptimumRecombination Efficiency

To examine the sequence requirements for attB and to determine whichattB sites would clone with the highest efficiency from populations ofdegenerate attB sites, a series of experiments was performed. DegeneratePCR primers were designed which contained five bases of degeneracy inthe B-arm of the attB site. These degenerate sequences would thustransfer with the gene into Entry clone in BP reactions and subsequentlybe transferred with the gene into expression clones in LR reactions. Thepopulations of degenerate attB and attL sites could thus be cycled fromattB to attL back and forth for any number of cycles. By altering thereaction conditions at each transfer step (for example, by decreasingthe reaction time and/or decreasing the concentration of DNA) thereaction can be made increasingly more stringent at each cycle and thusenrich for populations of attB and attL sites that react moreefficiently.

The following degenerate PCR primers were used to amplify a 500 bpfragment from pUC18 which contained the lacZ alpha fragment (only theattB portion of each primer is shown):

attB1: (SEQ ID NO: 47) GGGG ACAAGTTTGTACAAA AAAGC AGGCT attB1n16-20:(SEQ ID NO: 106) GGGG ACAAGTTTGTACAAA nnnnn AGGCT attB1n21-25:(SEQ ID NO: 107) GGGG ACAAGTTTGTACAAA AAAGC nnnnn attB2: (SEQ ID NO: 48)GGGG ACCACTTTGTACAAG AAAGC TGGGT attD2n16-20: (SEQ ID NO: 108)GGGG ACCACTTTGTACAAG nnnnn TGGGT attB2n21-25: (SEQ ID NO: 109)GGGG ACCACTTTGTACAAG AAAGC nnnnn

The starting population size of degenerate att sites is 4⁵ or 1024molecules. Four different populations were transferred through two BPreactions and two LR reactions. Following transformation of eachreaction, the population of transformants was amplified by growth inliquid media containing the appropriate selection antibiotic. DNA wasprepared from the population of clones by alkaline lysis miniprep andused in the next reaction. The results of the BP and LR cloningreactions are shown below.

BP-1, Overnight Reactions:

percent of cfu/ml control attB1-lacZa-attB2 78,500 100% attB1n16-20-lacZa-attB2 1,140 1.5% attB1n21-25-lacZa-attB2 11,100  14%attB1-lacZa-attB2n16-20 710 0.9% attB1-lacZa-attB2n21-25 16,600  21%

LR-1, pENTR201-lacZa×pDEST20/EcoR1, 1 Hour Reactions

percent of cfu/ml control attL1-lacZa-attL2 20,000 100% attL1n16-20-lacZa-attL2 2,125 11% attL1n21-25-lacZa-attL2 2,920 15%attL1-lacZa-attL2n16-20 3,190 16% attL1-lacZa-attL2n21-25 1,405  7%

BP-2, pEXP20-lacZa/ScaI×pDONR201, 1 Hour Reactions

percent of cfu/ml control attB1-lacZa-attB2 48,600 100% attB1n16-20-lacZa-attB2 22,800 47% attB1n21-25-lacZa-attB2 31,500 65%attB1-lacZa-attB2n16-20 42,400 87% attB1-lacZa-attB2n21-25 34,500 71%

LR-2, pENTR201-lacZa×pDEST6/NcoI, 1 Hour Reactions

percent of cfu/ml control attL1-lacZa-attL2 23,000 100%attL1n16-20-lacZa-attL2 49,000 213% attL1n21-25-lacZa-attL2 18,000  80%attL1-lacZa-attL2n16-20 37,000 160% attL1-lacZa-attL2n21-25 57,000 250%

These results demonstrate that at each successive transfer, the cloningefficiency of the entire population of att sites increases, and thatthere is a great deal of flexibility in the definition of an attB site.Specific clones may be isolated from the above reactions, testedindividually for recombination efficiency, and sequenced. Such newspecificities may then be compared to known examples to guide the designof new sequences with new recombination specificities. In addition,based on the enrichment and screening protocols described herein, one ofordinary skill can easily identify and use sequences in otherrecombination sites (e.g., other att sites, lox, FRT, etc.), that resultin increased specificity in the recombination reactions using nucleicacid molecules containing such sequences.

Example 13

Embedding of Functional Components in Recombination Sites

Recombination sites used with the invention may also have embeddedfunctions or properties. An embedded functionality is a function orproperty conferred by a nucleotide sequence in a recombination sitewhich is not directly associated with recombination efficiency orspecificity. For example, recombination sites may contain protein codingsequences (e.g., intein coding sequences), intron/exon splice sites,origins of replication, and/or stop codons. In generally, the longer thestretch of nucleic acid which makes up a recombination site the moreamendable the site will be to the incorporation of embedded functions orproperties. On the contrary, longer recombination sites will be morelikely to have features (e.g., stop codons) which interfere with desiredfunctions or properties. Further, recombination sites which have morethan one (e.g., two, three, four, five, etc.) embedded functions orproperties may also be prepared.

As explained below, in one aspect, the invention provides methods forremoving nucleotide sequences encoded by recombination sites from RNAmolecules. One example of such a method employs the use of intron/exonsplice sites to remove RNA encoded by recombination sites from RNAtranscripts. Again, as explained below, nucleotide sequences whichencode these intron/exon splice sites may be fully or partially embeddedin the recombination sites which encode sequences excised from RNAmolecules or these intron/exon splice sites may be encoded by adjacentnucleic acid sequence. Similarly, one intron/exon splice sites may beencoded by recombination site and another intron/exon splice sites maybe encoded by other nucleotide sequences (e.g., nucleic acid sequencesof the vector or a nucleic acid of interest). Nucleic acid splicing isdiscussed in the following publications: R. Reed, Cum Opin. Genet.Devel. 6:215-220 (1996); S. Mount, Nucl. Acids. Res. 10:459-472, (1982);P. Sharp, Cell 77:805-815, (1994); K. Nelson and M. Green, Genes andDevel. 23:319-329 (1988); and T. Cooper and W. Mattox, Am. J. Hum.Genet. 61:259-266 (1997).

In some instances it will be advantageous to remove either RNAcorresponding to recombination sites from RNA transcripts or amino acidresidues encoded by recombination sites. Removal of such sequences canbe performed in several ways and can occur at either the RNA or proteinlevel. One instance where it will generally be advantageous to removeRNA transcribed from a recombination site will be where a nucleic acidmolecule which an ORF is inserted into a vector in an orientation whichis intended to result in the expression of a fusion protein (e.g., GFP)between amino acid residues encoded by the ORF and amino acid residuesencoded by the vector (e.g., GFP). In such an instance, the presence ofan intervening recombination site between the ORF and the vector codingsequences may result in the recombination site (1) contributing codonsto the mRNA which results in the inclusion of additional amino acidresidues in the expression product, (2) contributing a stop codon to themRNA which prevents the production of the desired fusion protein, and/or(3) shifting the reading frame of the mRNA such that the two protein arenot fused “in-frame.”

One method for removing recombination sites from mRNA molecules involvesthe use intron/exon splice sites (i.e., splice donor and splice acceptorsites). Splice sites can be suitably positioned in a number oflocations. Using a Destination Vector designed to express an insertedORF with an N-terminal GFP fusion, as an example, the first splice sitecould be encoded for by vector sequences located 3′ to the GFP codingsequences and the second splice site could be partially embedded in therecombination site which separates the GFP coding sequences from thecoding sequences of the ORF. Further, the second splice site eithercould abut the 3′ end of the recombination site or could be positioned ashort distance (e.g., 2, 4, 8, 10, nucleotides) 3′ to the recombinationsite. In addition, depending on the length of the recombination site,the second splice site could be fully embedded in the recombinationsite.

A modification of the method described above involves the connection ofmultiple nucleic acid segments which, upon expression, results in theproduction of a fusion protein. In one specific example, one nucleicacid segment encodes GFP and another nucleic acid segment which containsan ORF of interest. Each of these segments is flanked by recombinationsites. In addition, the nucleic acid segments which encodes GFP containsan intron/exon splice site near its 3′ terminus and the nucleic acidsegments which contains the ORF of interest also contains an intron/exonsplice site near its 5′ terminus. Upon recombination, the nucleic acidsegment which encodes GFP is positioned 5′ to the nucleic acid segmentwhich encodes the ORF of interest. Further, these two nucleic acidsegments are separated by a recombination site which is flanked byintron/exon splice sites. Excision of the intervening recombination sitethus occurs after transcription of the fusion mRNA. Thus, in one aspect,the invention is directed to methods for removing RNA transcribed fromrecombination sites from transcripts generated from nucleic acidsdescribed herein.

One method which could be used to introduce intron/exon splice sitesinto nucleic acid segments is by the use of PCR. For example, primerscould be used to generate nucleic acid segments corresponding to an ORFof interest and containing both a recombination site and an intron/exonsplice site.

The above methods can also be used to remove RNA corresponding torecombination sites when the nucleic acid segment which is recombinedwith another nucleic acid segment encodes RNA which is not produced in atranslatable format. One example of such an instance is where a nucleicacid segment is inserted into a vector in a manner which results in theproduction of antisense RNA. As discussed below, this antisense RNA maybe fused, for example, with RNA which encodes a ribozyme. Thus, theinvention also provides methods for removing RNA corresponding torecombination sites from such molecules.

The invention further provides methods for removing amino acid sequencesencoded by recombination sites from protein expression products byprotein splicing. Nucleotide sequences which encode protein splice sitesmay be fully or partially embedded in the recombination sites whichencode amino acid sequences excised from proteins or protein splicesites may be encoded by adjacent nucleotide sequences. Similarly, oneprotein splice site may be encoded by a recombination site and anotherprotein splice sites may be encoded by other nucleotide sequences (e.g.,nucleic acid sequences of the vector or a nucleic acid of interest).

It has been shown that protein splicing can occur by excision of anintein from a protein molecule and ligation of flanking segments. (See,e.g., Derbyshire et al., Proc. Natl. Acad. Sci. (USA) 95:1356-1357(1998).) In brief, inteins are amino acid segments which arepost-translationally excised from proteins by a self-catalytic splicingprocess. A considerable number of intein consensus sequences have beenidentified. (See, e.g., Perler, Nucleic Acids Res. 27:346-347 (1999).)

Similar to intron/exon splicing, N- and C-terminal intein motifs havebeen shown to be involved in protein splicing. Thus, the inventionfurther provides compositions and methods for removing amino acidresidues encoded by recombination sites from protein expression productsby protein splicing. In particular, this aspect of the invention isrelated to the positioning of nucleic acid sequences which encode inteinsplice sites on both the 5′ and 3′ end of recombination sites positionedbetween two coding regions. Thus, when the protein expression product isincubated under suitable conditions, amino acid residues encoded theserecombination sites will be excised.

Protein splicing may be used to remove all or part of the amino acidsequences encoded by recombination sites. Nucleic acid sequence whichencode inteins may be fully or partially embedded in recombination sitesor may adjacent to such sites. In certain circumstances, it may bedesirable to remove considerable numbers of amino acid residues beyondthe N- and/or C-terminal ends of amino acid sequences encoded byrecombination sites. In such instances, intein coding sequence may belocated a distance (e.g., 30, 50, 75, 100, etc. nucleotides) 5′ and/or3′ to the recombination site.

While conditions suitable for intein excision will vary with theparticular intein, as well as the protein which contains this intein,Chong et al., Gene 192:271-281 (1997), have demonstrated that a modifiedSaccharomyces cerevisiae intein, referred to as Sce VMA intein, can beinduced to undergo self-cleavage by a number of agents including1,4-dithiothreitol (DTT), β-mercaptoethanol, and cysteine. For example,intein excision/splicing can be induced by incubation in the presence of30 mM DTT, at 4° C. for 16 hours.

Example 14 Removal of att Sites from RNA Transcripts by Pre-mRNASplicing in Eukaryotic Cells

Consensus RNA sequences in metazoan cells needed for removal of intronsby splicing of pre-mRNA transcripts normally contain the following threeelements:

1). At the 5′ end of the intron: exon-AG.vertline.GTRAGT-intron; where.vertline. denotes the border between the intron and exon, and R=purinenucleotide. This element is referred to herein as (GT);

2). At the 3′ end of the intron: intron-Yn-N-CAG.vertline.G-exon; whereYn=a pyrimidine-rich sequence of 10-12 nucleotides. This element isreferred to herein as (Yn-AG);

3). At the branch point within the intron, .about.20-40 bases 5′ to(Yn-AG): YNRA*Y; where Y is a pyrimidine nucleotide and A* is the branchpoint adenosine that participates in the initial transesterificationreaction to form an RNA lariat. This element is referred to herein as(BP-A*).

Underlined sequences shown above are those highly conserved and aregenerally believed to be required for splicing activity; othernucleotides in the consensus sequences are less highly conserved.

1. attB Splicing

These splicing elements can be combined with GATEWAY™att-site-containing vectors in at least the following three ways toremove attB1 sites by RNA splicing.

Method 1: (GT)-(BP-A*)[attB1](Yn-AG)-ORF

In this method, the (BP-A*) element is located just 5′ to the end ofattB1, and the (Yn-AG) consensus is merged with the 3′-end of the attB1sequence, exploiting the flexibility of the 5 nucleotides flanking thecore of the attB sequence. The (GT) consensus can be positionedconveniently ten or more nucleotides upstream from (BP-A*) element.

This arrangement has the advantage that it requires a minimum sequenceaddition between the 3′ end of the attB1 site and the sequence encodingthe ORF. A potential difficulty with the use of this approach is thatthe pyrimidine-rich sequence in (Yn-AG) overlaps with the attB1sequence, which is relatively purine rich. Thus, in certain instances,sufficient nucleotide changes (to C or T) in the attB1 site to permitefficient splicing may not be compatible with efficient B×Precombination.

Sequences positioned 5′ to the recombination cleavage site within attB1are contributed in Expression Clones by the Destination Vector, whilesequences 3′ to this site are derived (in most cases) from an attB-PCRproduct. If the splicing reaction is intended to fuse RNA encoding anN-terminal protein (contributed by a Destination Vector) to RNA encodinganother ORF (contributed by an Entry Clone), the positioning of (GT) and(Yn-AG) will generally be positioned so that the spliced productmaintains the desired translational reading frame.

Method 2: (GT)-(BP-A*)[attB1](Yn-AG)-ORF

In this method, the (Yn-AG) consensus is immediately next to the attB1site; consequently the branch point A* in (BP-A*) element will generallyneed to be close to the attB1 site. Thus, the distance from AG in(Yn-AG) will generally be no more than about 40 nucleotides.

The (Yn-AG) sequence can be added as part of a primer adapter, assumingthe Entry Clone is constructed using attB-PCR. Further, this primer canbe designed using a consensus (Yn-AG) sequence which favors efficientsplicing. In some instances, the presence of the attB1 sequence between(BP-A*) and (Yn-AG) may interfere with splicing. If such cases, theattB1 sequence can be mutated to accommodate a more optimal splicingsequence.

Method 3: (GT)-[attB1]-(BP-A*)-(Yn-AG)-ORF

This method employs an arrangement which allows one to choose an optimalsplicing sequence and spacing for the combined elements comprising(BP-A*)-(Yn-AG). The minimum size for this combination is expected to beabout 20 nucleotides. Therefore this sequence will normally be added toPCR products as an attB1-primer adapter of about 45-50 nucleotides.

Similar considerations apply to designing sequences that allow splicingto remove the attB2 site from mRNA. But since in this case (BP-A*) and(Yn-AG) can be contributed by the Destination Vector, the mostattractive option is:

ORF-(GT)[attB2]-(BP-A*)-(Yn-AG), where the sequence between (GT) andattB2 is minimized, to reduce the size of the attB2-PCR adapter primer.Minimized sequences suitable for use in particular cases can bedetermined experimentally using methods described herein.

Another way to produce a vector that splices attB sites is to constructa vector directly that contains splicing signals flanking the attB1 andattB2 sites. The main difference from the approaches described above isthat any sequences added there using attB primer adapters (as in B andC) could be pre-installed into the vector itself next to a multiplecloning site positioned between the attB sites.

2. attL Splicing

The sequences encoding attL1 and attL2 sites may be removed fromtranscripts by RNA splicing. However, the 100 nucleotide length of attLimposes a constraint on the options for arranging the splicing sequenceelements. This distance is generally too great for the placement ofattL1 between (BP-A*) and (Yn-AG). One alternative which can be employedis that either or both of these elements can be embedded in a mutatedversion of attL1. Another approach is that these elements (i.e., (BP-A*)and (Yn-AG)) can be contributed by an attB-adapter primer and (GT) canbe provided by the attP Donor plasmid. By recombining these elements ina B×P reaction an entry clone with splice sites for splicing attB1 iscreated.

Similarly, for splicing of attL2, there is no practical limit to thelength of sequence allowed between (GT) and (BP-A*). So (GT) could beprovided on the attB2 adapter primer, while (BP-A*) and (Yn-AG) would becontributed by the attP Donor Vector. For such uses, the attP DonorVector will generally need to contain a eukaryotic promoter and the rrnBtranscription termination sequences will generally need to be removed.The potential for an adverse effect of the attL2 sequence between (GT)and (BP-A*) seems low, but may need to be determined on a case by casebasis.

A potential advantage of splicing attL sequences from Entry Clonetranscripts is that users could clone and express PCR products directlyas Entry Clones, without need for further subcloning into a DestinationVector. Further, the presence of a termination codon in our attL1sequence, which appears difficult to remove without diminishing L×Rrecombination, would be of no consequence to translation of ORFs fusedwith N-terminal peptides.

The above describes some applications of RNA splicing with the GATEWAY™system, which is to remove attB1 between ORF and N-terminal sequencesand to remove attB2 sequences between ORF sequences and C-terminalsequences of protein fusions. Other applications would be apparent toone skilled in the art. Further, one such application is the use of theRNA splicing process to remove att sequences interposed (as a result ofperforming a GATEWAY™ recombination-based subcloning reaction) betweenthe sequences encoding multiple protein domains in a eukaryoticexpression vector, where the ORFs encoding the various domains areseparated by an att site sequence. Such vectors can be constructedreadily by GATEWAY™ recombination with att sites of multiplespecificities, such as att1, att2, att3, att4, etc. Although thisapproach permits rapid construction of protein fusions, as well asshuffling of DNA sequences encoding protein domains, the recombinationproducts typically will contain 25 bp attB sites (or 100 bp attL sites)intervening between these domains, whose removal often will bedesirable. The RNA splicing mechanism described is one way to removethese intervening sequences. The use of splicing to remove att sitesbetween multiple protein domains also makes it practical to make theseconstructs using GATEWAY™ recombination reactions between attB and attPsites, which yield attL and attR sites. This is because either type ofatt sequence (attB or attL) could be removed by an RNA splicing reactionin a properly designed vector. In other situations, it will be useful toremove by splicing attR and/or attP sites as well.

A second application addresses the common problem of obtaining copies oflarge or rare mRNAs. Some mRNAs are difficult to reverse transcribe(into cDNA) in their entirety due to their large size and/or lowabundance. Often, one or both ends of the cDNA can be obtained, but theentire sequence as one molecule is unobtainable. When two or moredifferent portions of the cDNA are available which together constitutethe entire mRNA sequence, the sequence of these cDNA sequences can bedetermined and PCR primers synthesized. Then using attB-primers eachnon-overlapping portion of the entire transcript can be amplified byPCR. These amplified sequences then can be combined in the proper orderusing GATEWAY™ recombination. Such a recombination product will comprisethe various sequences in their proper order, but separated by att sites.Given the appropriate transcription promoter and termination signals,such constructs can be used to prepare RNA either in vitro for use in anin vitro splicing reaction, or to transfect metazoan cells with anappropriate construct allowing transcription followed by RNA splicingwithin the cell. In this manner, transcripts of the authentic mRNA canthen be produced. Such mRNA transcripts can be used directly for studiesof biological function of the protein encoded by the spliced transcript.Alternatively, because the transcripts can be produced in abundance withthis approach, it becomes more feasible to produce a cDNA copy of thespliced RNA. This cDNA, which lacks the intervening att sequences, isuseful for producing the encoded protein in cells lacking the propersplicing machinery, such as E. coli.

A third application of this technology makes it possible to producereplicas of mRNAs that are difficult to obtain due to their lowabundance or lack of suitable tissue sources. Most metazoan genesencoding proteins consist of exons sequences separated by intronsequences. Whenever exon-intron borders of a gene can be predictedaccurately from genomic DNA sequences by bioinformatic algorithms, PCRproducts flanked by att site sequences can be synthesized that containthe exon sequences. With proper design of the att sequences flankingthese products, they can be linked each together in the proper order,while preserving the correct translational reading frame, using GATEWAY™recombination. By including the appropriate transcription signals, theseconstructs can serve as templates to synthesize an RNA transcriptcontaining the ordered exon sequences, each separated by an attsequence. Given that the appropriate splicing signals are included inthese constructs, the transcripts produced will be processed by thesplicing reactions of metazoan cells to yield nucleic acids whichcorrespond to naturally produced mRNA sequences. In this manner one caneliminate the need first to isolate mRNA from cells. Further, cellsproducing such mRNA from splicing of transcripts made as described abovecan be used directly for studies of biological function or as a sourceof a desired mRNA to produce its cDNA. Alternatively, these constructscould be spliced in vitro using properly constituted splicing extracts.

Example 15 Determination of Gene Expression Profiles of Cells

The invention further provides compositions and methods for cloning andsequencing multiple cDNA molecules. In general, these methods involvegenerating concatamers of cDNA molecules and performing sequencingreactions on these molecule to determine the nucleotide sequences of theindividual inserts. Such methods are particularly useful for determiningthe gene expression profile of particular cells and/or tissues. Oneexample of such a method, as well as a vector produced by the describedmethod, are shown in FIG. 23.

The vector shown in FIG. 23 contains a series of relatively short cDNAinserts (e.g., 10, 15, 20, 25, 30, 45, or 50 nucleotides in length)connected to each other by attB sites. The vector shown in FIG. 23 alsocontains sequencing primer sites adjacent to each side of the cDNAinsertion site.

Nucleic acid molecules which represent genes expressed in a cell ortissue may be broken into relatively small fragments in a number ofways, including mechanical shearing, digestion with one or a combinationof restriction enzymes (e.g., NlaIII, Sau3A, etc.), or digestion with anendonuclease having little or no sequence specificity (e.g., Micrococcalnuclease, DNAseI, etc.). The conditions will generally be adjusted sothat nucleic acid fragments of a specific average size are produced.Further, if desired, nucleic acid fragments of a particular size can beisolated before insertion into a vector. Methods of separating nucleicacid molecules based on size are known in the art and include the columnchromatography and gel electrophoresis (e.g., agarose and polyacrylamidegel electrophoresis).

Nucleotide sequence data may be obtained by sequencing nucleic acidsconnected by methods of the invention and inserted in a sequencingvector using standard methods known in the art. In most instances,neither the 5′ to 3′ orientation of the nucleic acid inserts in thesequencing vector nor the strand which is sequenced will not be relevantfor determining the gene expression profile of a cell or tissue. This isso because it will generally be possible to identity of the mRNA fromwhich the sequenced nucleic acid was derived regardless of theorientation of the sequenced nucleic acid segment or strand which issequenced.

Thus, the invention provides methods for determining the gene expressionprofile of cells and/or tissues. In one aspect, the invention providesmethods for determining the gene expression profile of cells and/ortissues, comprising (a) generating one or more populations of cDNAmolecules from RNA obtained from the cells and/or tissues, wherein theindividual cDNA molecules of these populations comprise at least tworecombination sites capable of recombining with at least onerecombination site present on the individual members of the same or adifferent population of cDNA molecules, (b) contacting the nucleic acidmolecules of (a) with one or more recombination proteins underconditions which cause the nucleic acid molecules to join, and (c)determining the sequence of the joined nucleic acid molecules.

Example 16 Use of GATEWAY™ System to Clone the Tet and LacZ Genes

The following attB sites was added to PCR primers which were synthesizedby standard methods. The attB1 and attB2 sites were shown as thestandard GATEWAY™ reading frame (see GATEWAY™ GATEWAY™ CloningTechnology Instruction Manual (Invitrogen Corp., Carlsbad, Calif.)) andis indicated below. The reading frame of attB5 may be altered asappropriate. The selection of a reading frame can be used to generatefusion proteins.

attB1 (5′-end of fragment A): (SEQ ID NO: 110)GGGG ACA ACT TTG TAC AAA AAA GTT GNN attb5 (3′-end of fragment A):(SEQ ID NO: 111) GGGG A CAA CTT TGT ATA ATA AAG TTGattB5R (5′-end of fragment B): (SEQ ID NO: 112)GGGG A CAA CTT TAT TAT ACA AAG TTG attb2 (3′-end of fragment B):(SEQ ID NO: 113) GGG AC AAC TTT GTA TAATAA AGT TGN

Nucleic acid fragments encoding the tet gene (primed with 5′-attB1 and3′-attB5) and the lacZ gene (primed with 5′-attB5R and 3′-attB2) wereamplified by PCR and precipitated using polyethylene glycol as follows.150 μl of TE is added to a 50 μl PCR reaction, followed by the additionof 100 μl of 30% PEG8000, 30 mM MgCl₂. The solution is then mixed ancentrifuged at about 10,000×g at room temperature for 15 minutes. ThePEG solution is then removed and the pellet id dissolved in TE.

The B1-tet-B5 PCR product was mixed with an attP1-ccdB-attP5 donorvector (pDONR-P1/P5) and reacted with BP CLONASE™ using a standardprotocol (see Example 3 herein) to generate an attL1-tet-attL5 entryclone. The B5R-lacZ-B2 PCR product was mixed with an attP5R-ccd1B-attP2donor vector (pDONR-P5R/P2) and reacted with BP CLONASE™ to generate anattR5-lacZ-attL2 entry clone.

After incubation for 1-4 hours at 25° C., 2 μl of Proteinase K (2 mg/ml)was added stop the BP reactions. DH5α cells were then transformed withthe LR vectors (i.e., entry clones) and plated on LB-Kan plates. Theplates were incubated overnight at 25° C. Miniprep DNA was prepared fromindividual DH5α colonies and quantitated by agarose gel electrophoresis.

An LR CLONASE™ reaction was prepared in a reaction volume of 20 μlcontaining the following components:

60 ng (25 fmoles) of the supercoiled tet entry clone

75 ng (20 fmoles) of the supercoiled lacZ entry clone

150 ng (35 fmoles) of pDEST6 (described in PCT Publication WO 00/52027,the entire disclosure of which is incorporated herein by reference)linearized with NcoI

41 μl of LR4 reaction buffer

4 μl of LRLONASE™

The reaction was incubated at 25° C. overnight and stopped with 2 μl ofproteinase K solution (2 mg/ml). 2 μl was used to transform 100 μl of LEDH5α cells and plated on LBamp plates containing XGal. Approximately35,000 colonies were generated in the transformation mixture with cellsat an efficiency of 1.6×10⁸ cfu/pg of pUC DNA. All the colonies appearedblue indicating the presence of the lacZ gene. 24 colonies were streakedonto plates containing tetracycline and XGal. 24 out of 24 colonies weretetracycline resistant. 15 colonies were used to inoculate 2 ml of LBamp broth for mini preps. 15/15 minipreps contained a supercoiledplasmid of the correct size (8.8 kb). Three miniprep DNAs were digestedwith EcORV. A banding pattern was observed that was consistent with thetwo fragments cloned in the correct orientation.

The resulting nucleic acid product consists of the two fragments linkedtogether and cloned into the destination vector. The structure of thesetwo fragments, as the are inserted into the destination vector, is asfollows (arrows indicate the orientation of attB sites with respect tothe overlap sequence):

attB1→tet←attB5-lacZ←attB2

Example 17 Use of GATEWAY™ System to Clone the Tet, LacZ and Neo Genes

The following attB sites are added to PCR primers which are synthesizedby standard methods. The attB 1 and attB2 sites are shown as thestandard GATEWAY™ reading frame (see GATEWAY™ GATEWAY™ CloningTechnology Instruction Manual (Invitrogen Corp., Carlsbad, Calif.) andis indicated below. The reading frame of attB5 and attB21 may bespecified by the user.

attB1 (5′-end of fragment A): (SEQ ID NO: 114)GGGG ACA ACT TTG TAC AAA AAA GTT GNN attB5 (3′-end of fragment A)(SEQ ID NO: 111) GGGG A CAA CTT TGT ATA ATA AAG TTGattB5 (3′-end of fragment A) (SEQ ID NO: 111)GGGG A CAA CTT TGT ATA ATA AAG TTG attB5R (5′-end of fragment B):(SEQ ID NO: 112) GGGG A CAA CTT TAT TAT ACA AAG TTGattB21R (3′-end of fragment B): (SEQ ID NO: 115)GGG A CAA CTT TTT AAT ACA AAG TTG attB21 (5′-end of fragment C):(SEQ ID NO: 116) GGGG A CAA CTT TGT ATT AAA AAG TTGattB2 (3′-end of fragment C): (SEQ ID NO: 117)GGGG AC AAC TTT GTA TAA TAA AGT TGN

Nucleic acid fragments encoding the tet gene (primed with 5′-attB1 and3′-attB5), the Neo gene (primed with 5′-attB5R and 3′-attB21R), and thelacZ gene (primed with 5′-attB21 and 3′-attB2) were amplified by PCR andprecipitated using polyethylene glycol.

The B1-tet-B5 PCR product was mixed with an attP1-ccdB-attP5 donorvector (pDONR-P1/P5) and reacted with BP CLONASE™ using a standardprotocol to generate an attL1-tet-attL3 entry clone. The B5R-Neo-B21RPCR product was mixed with an attP5R-ccdB-attP21R donor vector(pDONR-P5R/P21R) and reacted with BP CLONASE™ to generate anattR5-Neo-attR21 entry clone. The B21-lacZ-B2 PCR product was mixed withan attP21-ccdB-attP2 donor vector (pDONR-P21/P2) and reacted with BPCLONASE™ to generate an attL21-lacZ-attL2 entry clone.

An LR CLONASE™ reaction was prepared in a reaction volume of 20 μlcontaining the following components:

40 ng (17 fmoles) of the supercoiled tet entry clone

50 ng (19 fmoles) of the supercoiled or linear (VspI digested) Neo entryclone

75 ng (20 fmoles) of the supercoiled lacZ entry clone

150 ng (35 fmoles) of pDEST6 linearized with NcoI

4 μl of LR4 reaction buffer (200 mM Tris HCl (pH 7.5), 4.75 mM EDTA, 4.8mg/ml BSA, 445 mM NaCl, 47.5 mM spermidine)

4 μl of LR CLONASE™

The reaction was incubated at 25° C. overnight and stopped with 2 μl ofproteinase K solution (2 mg/ml). Two μl was used to transform 100 μl ofDH5α c LE cells and plated on LBamp plates containing XGal.Approximately 3,200 colonies were generated in the transformationmixture with supercoiled entry clones. 5,300 colonies were generated inthe transformation mixture with the reaction containing the VspIdigested Neo entry clone. The efficiency of the competent cells was1.2×10⁸ cfu/μg of pUC DNA. All the colonies appeared blue indicating thepresence of the lacZ gene. Nine colonies were streaked onto tet platescontaining XGal. Nine out of 9 colonies were tetracycline resistant.Nine colonies were used to inoculate 2 ml of LBamp broth for mini preps.Nine out of 9 minipreps contained a supercoiled plasmid of the correctsize (11 kb). Nine miniprep DNAs were digested with EcoRV. A bandingpattern was observed that was consistent with the three fragments clonedin the correct orientation.

The resulting nucleic acid product consists of the three fragmentslinked together and cloned into the destination vector. The structure ofthese three fragments, as the are inserted into the destination vector,is as follows (arrows indicate the orientation of attB sites withrespect to the overlap sequence):

attB1→tet-attB5-Neo-attB21→lacZ←attB2

Example 18 Use of the GATEWAY™ and Multiple att Sites with DifferentSpecificities to Clone a Lux Operon

The lux operon genes (luxA, luxB, luxC, luxD and luxE) of Vibriofischeria genomic DNA were amplified using the primers listedimmediately below that introduced an optimal Shine-Delgamo and Kozaksequence (ggaggtatataccatg (SEQ ID NO:118)) at the 5′-end and a T7promoter and stop codon (gaagctatagtgagtcgtatta) (SEQ ID NO:183) at the3′-end of each ORF.

TABLE 10 SD 5′ and T7 3′ lux primers. SD 5′luxA ggaggtatataccatgAAGTTTGGAAATATTTGTTTTTC (SEQ ID NO: 119) T7 3′luxA gaagctatagtgagtcgtattaTTTAGGTTCTTTTAAGA AAGGAGCGAC (SEQ ID NO: 120)SD 5′ luxB ggaggtatataccatgAAATTTGGATTATTTTTTCTAA AC (SEQ ID NO: 121)T7 3′ luxB gaagctatagtgagtcgtattaTGGTAAATTCATTTCGA TTTTTTGG(SEQ ID NO: 122) SD 5′ luxC ggaggtatataccatgAATAAATGTATTCCAATGATAATTAATGG (SEQ ID NO: 123) T7 3′luxC gaagctatagtgagtcgtattaTGGGACAAAAACTAAAA ACTTATCTTCC(SEQ ID NO: 124) SD 5′ luxD ggaggtatataccatgAAAGATGAAAGTGCTTTTTTTACGATTG (SEQ ID NO: 125) T7 3′luxD gaagctatagtgagtcgtattaAGCCAATTCTAATAATT CATTTTC (SEQ ID NO: 126)SD 5′ luxE ggaggtatataccatgACTGTCCATACTGAATATAAAAG AAATC(SEQ ID NO: 127) T7 3′ luxE gaagctatagtgagtcgtattaAATCCTTGATATTCTTTTGTATGACATTAGC (SEQ ID NO: 128)

The PCR products were further amplified with attB-SD and attB-T7 adapterprimers listed immediately below utilizing the Shine-Delgarno and T7promoter sequences as primer sites to add attB sites to the ends of thePCR products.

TABLE 11 attB SD and T7 adapter primers. B1.6 SDGGGGACAACTTTGTACAAAAAAGTTGAAggaggtatataccatg (SEQ ID NO: 129) B5 T7GGGGACAACTTTGTATAATAAAGTTGgaagctatagtgagtcgt (SEQ ID NO: 130) B5R SDGGGGACAACTTTATTATACAAAGTTGAAggaggtatataccatg (SEQ ID NO: 131) B11 T7GGGGACAACTTTGTATAGAAAAGTTGgaagctatagtgagtcgt (SEQ ID NO: 132) B11R SDGGGGACAACTTTTCTATACAAAGTTGAAggaggtatataccatg (SEQ ID NO: 133) B17 T7GGGGACAACTTTGTATACAAAAGTTGgaagctatagtgagtcgt (SEQ ID NO: 134) B17R SDGGGGACAACTTTTGTATACAAAGTTGAAggaggtatataccatg (SEQ ID NO: 135) B21 T7GGGGACAACTTTGTATTAAAAAGTTGgaagctatagtgagtcgt (SEQ ID NO: 136) B21R SDGGGGACAACTTTTTAATACAAAGTTGAAggaggtatataccatg (SEQ ID NO: 137) B2.10 T7GGGGACAACTTTGTACAAGAAAGTTGgaagctatagtgagtcgt (SEQ ID NO: 138)

In this way the following attB PCR products were generated:

attB1. 6-SD-luxC-T7-attB5

attB5R-SD-luxD-T7-attB111

attB1 R-SD-luxA-T7-attB17

attB17R-SD-luxB-T7-attB21

attB21R-SD-luxE-T7-attB2.10

Each attB PCR product was precipitated with polyethylene glycol andreacted with the appropriate attP plasmid to generate Entry Clones ofeach lux ORF.

TABLE 12 BP Reaction Setup 1 2 3 4 5 6 7 8 9 10 TE 7 μl 7 μl 7 μl 7 μl 7μl 7 μl 7 μl 7 μl 7 μl 7 μl attB1-luxC-attB5 (10 ng/μl) 2 μl 2 μlattP1-attP5 (150 ng/μl) 2 μl 2 μl attB5R-luxD-attB11 2 μl 2 μl (10ng/μl) attP5R-attP11 (150 ng/μl) 2 μl 2 μl attB11R-luxA-attB17 2 μl 2 μl(10 ng/μl) attP17R-attP17 (150 ng/μl) 2 μl 2 μl attB17R-luxB-attB21 2 μl2 μl (10 ng/μl) attP17R-attP21 (150 ng/μl) 2 μl 2 μl attB21R-luxE-attB22 μl 2 μl (10 ng/μl) attP21R-attP2 (150 ng/μl) 2 μl 2 μl BP Buffer 4 μl4 μl 4 μl 4 μl 4 μl 4 μl 4 μl 4 μl 4 μl 4 μl BP Clonase Storage 4 μl — 4μl — 4 μl — 4 μl — 4 μl — Buffer BP Clonase — 4 μl — 4 μl — 4 μl — 4 μl— 4 μl

The reactions were incubated at 25° C. overnight. Each reaction wasstopped by the addition of 2 μl of Proteinase K (2 mg/ml) solution andincubated 10 minutes at 37° C. Two μl of each reaction was used totransform LEDH5a cells. One hundred μl ( 1/10) of each transformationwas plated on LB agar containing 50 μg/ml kanamycin. The appropriatepENTR-lux clone was isolated from each reaction as determined by rapidminiprep analysis.

The luxA Entry Clone (pENTR-luxA) was digested with VspI to linearizethe plasmid in the plasmid backbone. Equal amounts (40 ng) of each ofthe five lux Entry Clones were mixed with 150 ng of pDEST14 in a singleLR reaction containing LR4 buffer and LR Clonase. Negative controlreactions were prepared consisting of a no Clonase reaction and a nopENTRluxA reaction.

TABLE 13 LR Reaction Setup 1 2 3 TE — 4 μl — pENTRluxC (20 ng/μl) 4 μl 4μl 4 μl pENTRluxD (20 ng/μl) 4 μl 4 μl 4 μl pENTRluxA/VspI cut (20ng/μl) 4 μl — 4 μl pENTRluxB (20 ng/μl) 4 μl 4 μl 4 μl pENTRluxE (20ng/μl) 4 μl 4 μl 4 μl pDEST14/NcoI (150 ng/μl) 1 μl 1 μl 1 μl LR4 Buffer8 μl 8 μl 8 μl LR Clonase Storage Buffer 8 μl — — LR Clonase — 8 μl 8 μl

The reactions were incubated at 25° C. overnight. Each reaction wasstopped by the addition of 4 μl Proteinase K (2 mg/ml) solution andincubated for 10 minutes at 37° C. Two μl of each reaction was used totransform LEDH5a cells. One hundred μl ( 1/10) of each transformationwas plated on LB agar containing 100 μg/ml ampicillin.

The transformations generated no colonies for reaction 1 (no clonase),approximately 200 colonies for reaction 2 (no pENTRluxA DNA) andapproximately 2500 colonies for reaction 3 (complete reaction). Tencolonies were picked from reaction 3 and examined by miniprep analysis.All 10 clones were determined to be correct based on size of thesupercoiled plasmid DNA (10.3 kb) and by diagnostic restriction digests.The synthetic lux operon construct was transformed into BL21SI cells andluciferase activity was monitored by luminometry. Four independentisolates were demonstrated to generate titratable salt-inducible lightin BL21SI cells. No light was detected in BL21SI cells containing pUCDNA. Since the light output was generated and detected in live E. colicells the functional activity of all five lux genes was confirmed.

Example 19 Generation of pDONR Vectors

As in the example above (lux operon cloning), a collection of vectorelement Entry Clones was generated by attB PCR cloning. The Entry Cloneswere designed such that when a set of 4 vector element Entry Clones arereacted together, each vector element is linked together to assemble anew vector (FIG. 26A-26B). In this example two new attP DONOR vectorswere constructed.

The following set of attB PCR products was generated:

attB21R-attP1-ccdB-cat-attP2-attB5

attB5R-kan-attB11

attB5R-amp-attB11

attB11R-loxP-attB17

attB17R-pUC ori-attB21

Each attB PCR product was purified by PEG precipitation and reacted withthe appropriate attP plasmid to generate Entry Clones of each vectorelement as follows:

TABLE 14 BP Reaction Setup 1 2 3 4 5 6 7 8 9 10 TE 7 μl 7 μl 7 μl 7 μl 7μl 7 μl 7 μl 7 μl 7 μl 7 μl attB21R-attP1- 2 μl 2 μl ccdB-cat-attP2-attB5 (10 ng/μl) attP21R-attP5 2 μl 2 μl (150 ng/μl) attB5R-kan-attB11 2μl 2 μl (10 ng/μl) attP5R-attP11 2 μl 2 μl (150 ng/μl) attB5R-amp- 2 μl2 μl attB11 (10 ng/μl) attP5R-attP11 2 μl 2 μl (150 ng/μl) attB11R-loxP-2 μl 2 μl attB17 (10 ng/μl) attP11R-attP17 2 μl 2 μl (150 ng/μl)attB17R-pUC ori- 2 μl 2 μl attB21 (10 ng/μl) attP17R-attP21 2 μl 2 μl(150 ng/μl) BP Buffer 4 μl 4 μl 4 μl 4 μl 4 μl 4 μl 4 μl 4 μl 4 μl 4 μlBP Clonase 4 μl — 4 μl — 4 μl — 4 μl — 4 μl — Storage Buffer BP Clonase— 4 μl — 4 μl — 4 μl — 4 μl — 4 μl

The reactions were incubated at 25° C. overnight. Each reaction wasstopped by the addition of 2 μl of Proteinase K (2 mg/ml) solution andincubated 10 minutes at 37° C. Two μl of each reaction was used totransform LEDH5a cells. 100 μl ( 1/10) of each transformation was platedon LB agar containing 50 μg/ml kanamycin. Colonies were picked and usedto isolate the following pENTR clones by rapid miniprep analysis:

pENTR-attR21-attP1-ccdB-cat-attP2-attL5 (isolated from reaction 2)

pENTR-attR5-kan-attL11 (isolated from reaction 4)

pENTR-attR5-amp-attL11 (isolated from reaction 6)

pENTR-attR11-loxP-attL17 (isolated from reaction 8)

pENTR-attR17-ori-attL211 (from reaction 10)

The attR21-attP1-ccdB-cat-attP2-attL5 Entry Clone was digested with VspIto linearize the plasmid in the plasmid backbone. Equal amounts (40 ng)of each of four Entry Clones were mixed in a single LR reactioncontaining LR4 buffer and LR Clonase. Negative control reactions wereprepared consisting of a no Clonase reaction and reactions containing nopENTR-attR21-attP1-ccdB-cat-attP2-attL5 DNA.

TABLE 15 LR Reaction Setup 1 2 3 4 TE — 4 μl — —pENTR-attR21-attP1-ccdB-cat-attP2-att- L5 4 μl — 4 μl 4 μl VspI cut (20ng/μl) pENTR-attR5-kan-attL11 (20 ng/μl) 4 μl 4 μl 4 μl —pENTR-attR5-amp-attL11 (20 ng/μl) — — — 4 μl pENTR-attR11-loxP-attL17(20 ng/μl) 4 μl 4 μl 4 μl 4 μl pENTR-attR17-ori-attL211 (20 ng/μl) 4 μl4 μl 4 μl 4 μl LR4 Buffer 8 μl 8 μl 8 μl 8 μl LR Clonase Storage Buffer8 μl — — — LR Clonase — 8 μl 8 μl 8 μl

The reactions were incubated at 25° C. overnight. Four μl of proteinaseK (2 mg/ml) solution was added to each reaction and 2 μl used totransform DB3.1 cells. One hundred μl ( 1/10) of the transformation wasplated on LB agar containing 20 μg/ml chloramphenicol and 50 μg/mlkanamycin (reactions 1, 2 and 3) or 20 μg/ml chloramphenicol and 100μg/ml ampicillin (reaction 4).

The transformations generated approximately 5000 and 10,000 colonies forreactions 3 and 4, respectively compared to the negative controls ofapproximately 500 colonies for reaction 1 (no clonase) and 80 coloniesfor reaction 2 (no pENTR-attR21-attP1-ccdB-cat-attP2-attL5 DNA). Sixcolonies were picked from both reactions 3 and 4 and examined byminiprep analysis. All of the clones were determined to be correct basedon size of the supercoiled plasmid DNA and by diagnostic restrictiondigests. The assembled vectors were shown to be functional by testingtheir ability to clone attB PCR products.

Example 20 Construction of attP DONOR Plasmids for Multisite Gateway

Four attP DONOR plasmids were constructed which contain the followingarrangements of attP sites (FIG. 26A):

attPx→ccdB-cat←attPy

attPx→ccdB-cat←attPy

attPx→ccdB-cat←attPy

attPx→ccdB-cat←attPy

The plasmids were constructed by PCR amplification of attP sites andattP DONOR vectors using primers containing compatible restrictionendonuclease sites. Each PCR product was digested with the appropriaterestriction enzyme. The digested attP DONOR vector PCR products weredephosphorylated and ligated to the digested attP sites. The products ofthe ligations consisted of plasmids containing of attP sites cloned intothe pDONOR vector in both orientations.

The attP plasmids described above were subsequently used as templatesfor PCR reactions (FIG. 26B). PCR was performed using primers that wouldanneal specifically to the core of an attP site and thus create an attLor attR site of any desired specificity at the ends of the PCR products(see the primers used in the methods of Example 9). For each new attPDONOR vector two such PCR products were generated, one consisting of theplasmid backbone (ori-kan) and a second consisting of the ccdB and catgenes. The PCR products were generated and reacted together in LRClonase reactions to generate new plasmids containing attP sites of anyorientation and specificity.

Example 21 Modular Vector Construction

Materials and methods of the invention may be used in conjunction withany site-specific recombinational cloning system. Methods of theinvention may be used to generate recombination sites with newspecificities (e.g., new att site specificities). The development ofsites having differing specificities allows the simultaneous cloning ofmultiple DNA fragments in a defined order and orientation, for example,in a single reaction. One example of materials and methods for thesimultaneous recombinational cloning of multiple fragments is theMultiSite GATEWAY™ system (Invitrogen Corporation, Carlsbad, Calif.catalog no. 12537023). This technology makes complex cloning schemessimpler and more efficient. Methods of the invention may be used in awide variety of applications including, but not limited to, expressionof multiple gene products from a single vector, addition of promoter/tagelements to the ends of nucleic acid molecules (e.g., standard GatewayEntry Clones (att L1/L2)), construction of gene-targeting vectors,engineering and shuffling of protein coding domains, construction ofsynthetic operons, biological and biochemical pathway engineering andgenome engineering.

In the practice of some methods of the invention, one or more nucleicacid molecules comprising one or more recombination sites may beprepared using any technique. For example, a set of nucleic acidmolecules may be prepared such that each nucleic acid molecule comprisesone or more recombination sites (e.g., two recombination sites) adjacentto a sequence of interest (e.g., recombination sites flanking a sequenceof interest). Such nucleic acid molecules may be mixed with a suitablevector (e.g., a vector comprising one or more recombination sites) inthe presence of one or more recombination proteins thereby simultaneouscloning multiple fragments into the vector backbone. Nucleic acidmolecules made using methods of the invention may be sequenced validatedand/or may serve as source clones in the assembly of further nucleicacid molecules. Using methods of the invention may eliminate the need tosequence validate the final assembled products. Further, in someembodiments, in the final assembled nucleic acid molecule, each of theoriginal nucleic acid molecules may be flanked by recombination sitespermitting replacement by any desired nucleic acid molecule comprisingsuitable recombination sites (e.g., sites compatible with those flankingthe nucleic acid molecule to be replaced). Thus, methods of theinvention provide maximum flexibility in vector construction.

In some embodiments, materials and methods of the invention may be usedfor the addition of nucleic acid molecules comprising sequences ofinterest (e.g., promoter sequences, sequences encoding polypeptide tags,etc.) to the 5′ and/or 3′ ends of nucleic acid molecules comprising oneor more recombination sites. For example, materials and methods of theinvention may be used to prepare nucleic acid molecules comprisingvarious combinations of promoters and ORFs. Such nucleic acid moleculesmay be used to study differential gene expression, in promoterinvestigations, to evaluate several different promoters and purificationtags (individually and in combination), to optimize protein expressionand purification, and to investigate protein domain swapping. Depictedin FIGS. 33 and 34 are some specific examples of materials and methodsof the invention. FIG. 33 depicts a method in which two sequences ofinterest (depicted as an ORF and a 5′-element) are combined into vectorin a single recombination reaction. FIG. 34 depicts a method in whichthree sequences of interest (depicted as an ORF, a 5′-element and a3′-element) are combined into a vector in a single recombinationreaction.

In some embodiments, materials of the invention may comprise one or morenucleic acid molecules comprising a recombination site. For example,nucleic acid molecules of the invention may comprise one or more of thefollowing sequences:

att B3 (SEQ ID NO: 141) 5′ CAACTTTGTATAATAAAGTTG 3′. att B4(SEQ ID NO: 142) 5′ CAACTTTGTATAGAAAAGTTG 3′.

Preferably, nucleic acid molecules comprising a sequence of one type ofrecombination site (e.g., an att3 site such as attB3, attP3, attL3, orattR3) will not recombine with a nucleic acid molecule comprising asequence of a different type of recombination site (e.g., an att4 sitesuch as attB4, attP4, attL4, or attR4). Thus, materials of the inventionmay include sequence specific recombination groups that do not recombinewith non-like sequences.

In some embodiments of the invention, nucleic acid molecules of theinvention may be introduced into host cells. For example, a nucleic acidmolecule of the invention may comprise a sequence encoding the ccdBgene. Such nucleic acid molecules may be replicated in DB3.1 cells. Sucha nucleic acid molecule may further comprise one or more selectablemarkers, for example, the kanamycin resistance gene, the ampicillinresistance gene, the chloramphenicol resistance gene, the spectinomycinresistance gene or combinations thereof. Such nucleic acid molecules maybe introduced into host cells and selected for using the appropriateantibiotics. For example nucleic acid molecules of the invention may beselected for using LB media or plates supplemented withKanamycin/Chloramphenicol 50 μg/ml and 30 μg/ml, respectively, fornucleic acid molecules comprising the sequences of the kanamycin andchloramphenicol resistance genes, Ampicillin/Chloramphenicol 100 μg/mland 30 μg/ml, respectively, for nucleic acid molecules comprising thesequences of the ampicillin and chloramphenicol resistance genes. Cellscomprising nucleic acid molecules of the invention may be amplified inLB media with the appropriate antibiotics.

Specific examples of nucleic acid molecules of the invention include,but are not limited to, pDONR5′ and pDONR3′. These nucleic acidmolecules are derivatives of pDONR 221. See FIGS. 29A-B for the nucleicacid sequence of pDONR221, FIGS. 41A-B for vector maps of pDONR5′ andpDONR3′, respectively and FIG. 54 for a vector map of pDONR221.

Other specific examples of nucleic acid molecules of the inventioninclude cassettes comprising recombination sites flanking one or moreselectable markers. Examples of such nucleic acid molecules include, butare not limited to, cassettes comprising attR4-Cm^(R)-ccdB-attR2 andattR4-Cm^(R)-ccdB-attR3, which were cloned into the filled-in Eco RI andHind III sites of pUC 19 δlac. Clones of correct orientation weredetermined by restriction enzyme digestion analysis and validated by DNAsequencing. pUC 19 δlac is a lac promoter deletion mutant of pUC 19. Thecassettes were excised from pDEST6 R4R2 and pDEST6 R4R3 (available fromInvitrogen Corporation, Carlsbad, Calif.) with Eco RV.

Methods of the invention may be used to generate nucleic acid molecules(e.g., PCR products) that will recombine with other nucleic acidmolecules of the invention. For example, an nucleic acid molecule of theinvention may be constructed so as to have an attB site, which may thenbe recombined with an attP site to generate a molecule having an attLand/or an attR site. In one embodiment, nucleic acid moleculescomprising attB sites may be constructed using any suitable techniqueand then may be reacted with nucleic acid molecules comprising attPsites to generate nucleic acid molecules comprising attL sites. attLsite containing nucleic acid molecules may then be reacted with attRsite containing nucleic acid molecules to produce nucleic acid moleculescomprising attB sites. Thus, nucleic acid molecules may be constructedthat comprise attB sites that can be recombined with a vector comprisingan attP sites. Such nucleic acid molecules may be constructed, forexample, by amplifying a nucleic acid sequence of interest with a primercomprising all or a portion of a recombination site sequence. Thus attBsites may be added to the ends of a sequence of interest (e.g., by PCR)to produce a nucleic acid molecule that can recombine with a nucleicacid molecule comprising attP sites (e.g., pDONR5′) to generate anucleic acid molecule comprising attL sites (e.g., pENTR5′). Suitableexamples of sequences that may be added to a sequence of interest by PCRinclude, but are not limited to,

att B4 5′ GGGG CA ACT TTG TAT AGA AAA GTT G 3′ (SEQ ID NO:143, which maybe added to the 5′ PCR primer), and

att B1 5′ GGGG C TGC TTT TTT GTA CAA ACT TG 3′ (SEQ ID NO:144, which maybe added to the 3′ PCR primer)

To generate PCR products that will recombine with pDONR3′ to generatepENTR3′ clones the following sequences may be added to the 5′ end of 5′and 3′ PCR primers.

att B2 5′ GGGG CA GCT TTC TTG TAC AAA GTG G 3′ (SEQ ID NO:145, which maybe added to the 5′ PCR primer)

att B3 5′ GGGG C AAC TTT GTA TAA TAA AGT TG 3′ (SEQ ID NO: 146, whichmay be added to the 3′ PCR primer)

PCR conditions used were as recommended by the product profiles sheetsof the thermostable polymerases used with the exception that theannealing temperature of the PCR was set at 45° C.

The arabinose inducible promoter together with its regulatory proteinaraC was PCR amplified from pBAD HisA (Invitrogen Corporation, Carlsbad,Calif. catalog no. V43001) with the following 5′ and 3′ PCR primers:

5′ PCR primer, B4-AI, (SEQ ID NO: 147) 5′ GGGGCAACTTTGTATAGAAAAGTTGTTATGACAACTTGACGGCTACATCATTCACTTT 3′; and 3′ PCR primer, B1-AI,(SEQ ID NO: 148) 5′ GGGGCTGCTTTTTTGTACAAACTTGCCATGG TTAATTCCTCCTGTTAGCCCAAAAAAACG 3′.

An additional 3′ PCR primer (B1-Thio, 5′GGGGCTGCTTTTTTGTACAAACTTGCCAGGTTAGCGTCGAGGAAC TCTTTCAACTGAC 3′ (SEQ IDNO:149)) was synthesized to amplify the regulatory protein araC, thearabinose inducible promoter and the thioredoxin gene from the plasmidtemplate pBAD Thio (Invitrogen Corporation, Carlsbad, Calif. catalog no.K37001). The PCR products B4-A1-B1 and B4-AM-Thio-B1 were purified withthe Concert Rapid PCR purification system prior to their use in a BPClonase reaction.

The Standard Gateway Entry clone, pEntr ssGUS, (pEntr ssGUS;Glucoronidase with Shine-Delgano signal and stop codon), is availablefrom Invitrogen Corporation (Carlsbad, Calif.).

Two 3′ elements were PCR amplified, B2-V5-His-terminator-B3(template=pBAD Thio) and B2-ss alacZ19-B3 (template=pUC19). Primers foramplifying B2-V5-His-terminator-B3 were B2-V5HST (5′GGGCAGCTTTCTTGTACAAAGTGGGGTAAGCCTATCCCTAACCCT CTCCTCGGTCTC 3′ (SEQ IDNO:150)) and B3-V5HST (5′ GGGGCAACTTTGTATAATAAAGTTGAAGGCCCAGTCTTTCGACTGAGCCTTTCGTTTT 3′ (SEQ ID NO:151). Primers for amplification of B2-ssalacZ19-B3 were B2-19lacZ (5′ GGGGCAGCTTTCTTGTACAAAGTGGAGGAAACAGCTATGACCATGATTACGCCAA 3′ (SEQ ID NO:152)) and B3-19lacZ (5′GGGGCAACTTTGTATAATAAAGTTGCTA TGCGGCATCAGAGCAGATTGTACTGAG 3′ (SEQ IDNO:153)). The PCR products B2-V5-His-terminator-B3 and B2-ss (xlacZ19-B3were purified with the Concert Rapid PCR purification system prior totheir use in a BP Clonase reaction.

Recombination reactions to generate nucleic acid molecules comprising asequence of interest flanked by attL sites (e.g., Entry clones) byrecombining attB containing PCR products with attP containing nucleicacid molecules (e.g., pDONR5′, pDONR3′, etc.) were carried out in astandard BP clonase reaction (Invitrogen Corporation, Carlsbad, Calif.catalog no. 11789021). Typically 2 or 5 μl of the 20 μl BP Clonasereaction were transformed into 50 μl of One Shots TOP10 Chemicallycompetent cells (Invitrogen Corporation, Carlsbad, Calif. catalog no.C404003). 450 μl of SOC was added after heat-shock treatment and thecells were allowed to recover at 37° C. for an hour with shaking. 100 μlaliquots of the transformation mix were spread onto appropriate LB agarplates.

The final assembly of nucleic acid molecules comprising a sequence ofinterest flanked by attL sites (e.g., Entry clones) and nucleic acidmolecules comprising attR sites (e.g., Destination vectors) was carriedout in an LR reaction. The significant difference between a standard LRreaction and a MultiSite LR reaction is the use of 5×LR4 reaction buffer(also known as 5×MS LR buffer) in place of the standard 5×LR reactionbuffer. Additionally, the total moles of plasmids in the reaction waskept below 120 fmoles and the LR reaction was incubated at roomtemperature (22-25° C.) for 12-16 hours. Each vector was present in theLR reaction at equal molar amounts. Typically 5 μl of the 20 μl LRClonase reaction was transformed into 50 μl of One Shot® Top10Chemically competent cells. 450 μl of SOC was added after heat-shocktreatment and the cells were allowed to recover at 37° C. for an hourwith shaking. 100 μl aliquots of the transformation mix were spread ontoLB-Amp agar plates.

Proteins expressed from nucleic acid molecules constructed using thematerials and methods of the invention may be detected and/or analyzedusing techniques well known in the art. One suitable technique is achromogenic assay. Five different LB-Ampicillin agar plates (100 mm)were required for the chromogenic assay.

LB-Ampicillin agar plates (LB-Amp).

LB-Ampicillin X-Gal agar plates (LB-Amp/X-Gal).

LB-Ampicillin X-GIcA agar plates (LB-Amp/X-GIcA). These were made byspreading, using glass beads, 100 μl of a 2% X-GlcA solution onto LB-Ampplates an hour before plating of a transformation mix. [(2% X-GlcA wasdissolved in dimethylformamide), (X-GlcA; 5-Bromo-4-chloro-3-indolylb-D-glucuronide cyclohexylammonium salt, Sigma-Aldrich catalog numbersB6650 and B4782)].

The agar plates with arabinose were made by spreading 100 μl of a 20%arabinose solution with glass beads onto appropriated agar plates, thiswas done concurrently with the spreading of X-GlcA when LB-Amp/X-GlcAplus arabinose agar plates were required.

The Invitrogen Corporation, Carlsbad, Core Sequencing Facility sequencedall Entry and Expression clones. Primers used were M13 Forward (5′GTAAACGACGGCCAG 3′ (SEQ ID NO:154)) and M13 Rev (5′ CAGGAAACAGCTATGAC 3′(SEQ ID NO:155)). Plasmid DNAs submitted for sequencing were purifiedwith the Concert Midi-prep plasmid purification kits or the SNAP miniprep kits.

The present invention encompasses kits that may comprise one or morecomponents that may be used to link DNA elements to the 5′ and/or 3′ends of nucleic acid molecules comprising one or more sequence ofinterest flanked by recombination sites (e.g., standard Gateway Entryclones). Preferably, nucleic acid molecules are linked such that theoriginal translational reading frame of the recombination sites (e.g.,att B1 and B2 sites in an Entry clone) is maintained. To assessspecificity and efficiency of the assembly process two assays wereemployed;

(1) A chromogenic phenotype assay that is dependent on specificity andthe proper final order of the assembled fragments.

(2) A bacterial colony count of desired and undesired clones, asdetermined by the assay described above, would reflect the efficiency ofthe assembly process.

For the demonstration of specificity and efficiency of linking two DNAfragments the expression clone depicted in FIG. 35 was assembled.

The transformation mix of the assembly LR Clonase reaction was dividedinto two aliquots. The first aliquot was plated onto LB-Amp/XGlcA platesand the second aliquot plated onto LB-Amp/XGlcA plus arabinose plates.Plates were incubated at 37° C. and inspected after 12 hours but before15 hours of incubation. (Most bacteria possess Glucoronidase analogswhich will hydrolyze X-GlcA to generate the blue chromogenic producthowever these analogs are normally produce at low levels and will onlygenerate a weak positive reaction after 15 hours of incubation at 37°C.) Colony counts from several LR clonase assembly reactions aretabulated in Table 16.

TABLE 16 Efficiency of assembling two DNA fragments by MultiSite LRreaction as determined by colony formation. minus plus arabinose*Experiments arabinose* (blue/white) 1 352 347/0 2 267 275/4 3 180 181/14 165 165/1 5 190 200/0 6 302 330/0 Only clones with the AI promotercorrectly assembled at the 5′ end of the GUS gene will hydrolyze X-GlcAto produce blue colonies in the presence of arabinose. The whitecolonies seen in the plus arabinose column were re-streaked onLB-Amp/XGlcA plus arabinose plates and all re-streaked colonies had theblue phenotype. *numbers are per plate averages.

The cloning fidelity and efficiency of linking two DNA fragments, asreflected by the results in Table 16, appear to be 100%. The minimumnumber of colonies from a 20 μl LR Clonase assembly reaction was about3600 colonies.

As a secondary assay for fidelity and efficiency random colonies wereselected, their plasmid DNA isolated and analyzed by restriction enzymedigest (FIG. 36). All colonies generated the same restriction enzymedigestion pattern as predicted for the expression constructpEXP-AI-ssGUS. Clones from this analysis were amplified for plasmidpurification using the Concert Midi-prep plasmid purification kits andsubmitted for sequencing. The sequencing data also demonstrated that theLR Clonase reaction assembled the entry clones correctly onto thedestination vector as predicted and with no anomalies.

The present invention encompasses kits for the attachment of DNAelements to the 5′ and 3′ ends of nucleic acid molecules comprisingsequences of interest flanked by recombination sites (e.g., Entryclones) in a single recombination reaction (e.g., an LR reaction). Onenucleic acid molecule constructed using materials and methods of theinvention is depicted in FIG. 37.

The LR assembly reaction was transformed into TOP10 cells and platedonto LB-Amp plates. After an overnight incubation at 37° C. twenty-fiverandom colonies were picked and re-patched onto LB-Amp plates containeither X-GlcA or X-Gal substrates with and without arabinose.

TABLE 17 Chromogenic phenotype assay for the proper assembly of threeDNA fragments using the Modular Vector Construction kit. X-GlcA X-GalExperiment Colonies/rxn (−/+arabinose) (−/+arabinose) 1 3760 25 white/25blue 25 white/25 blue 2 2172 25 white/25 blue 25 white/25 blue 3 2200 25white/25 blue 25 white/25 blue Total number of colonies per 20 μl LRreaction is shown in the second column labeled Colonies/rxn. In theabsence of arabinose all colonies were white. However, in the presenceof arabinose all colonies turned blue with either the X-GlcA or X-Galsubstrates demonstrating proper assembly of the three Entry clones ontothe Destination vector pDEST R4R3.

Results tabulated in Table 17 clearly demonstrate that the assembly ofthree Entry clones onto the Destination vector pDESTR4R3 occurs at anextremely high fidelity with a reasonable output of colonies perreaction. To validate the Chromogenic assay six randomly selected cloneswere sequenced and also analyzed by restriction enzyme digest. The DNAsequencing results yielded sequences identical to the predicted sequenceof a properly assembled Expression construct and the restriction enzymedigest analysis is seen in FIG. 38.

PCR products flanked by appropriate att B sequences were recombined, ina BP Clonase reaction, with either pDONR5′ or pDONR3′ to generatepEntr5′ or pEntr3′ Entry clones, respectively. pEntr5′ AI, pEntr ssGUSand pEntr3′ ss alacZ19 Entry clones were cloned by BP Clonase reactionswith either pDONR5′ or pDONR3′ and PCR product (see above). As apositive control for the BP Clonase reaction linearized pEXP-AI-ssGUS-ssalacZ19 (also known as pMVC control, FIG. 37) was used as a source ofatt B containing fragments in the control BP Clonase reactions. The useof a linearized vector allows for an accurate determination of insert ina control BP reaction. The reaction was set up as listed below and theresults are seen in Table 18. Mini-prep plasmid DNA was prepared fromfour random colonies and their restriction digest analysis indicatedthat all selected clones were correct.

The control BP Clonase reaction contained:

pDONR5′ or pDONR3′ (150 ng) 1 μl pMVC control (Aat II) (50 ng) 1 μl 5XBP buffer 4 μl BP Clonase 4 μl T.E. 10 μl  Final volume 20 μl 

TABLE 18 Colony counts from the control BP Clonase reactions. BP ClonaseReaction 1 hour reaction 3 hour reaction pDonr 5′ 225 197 pDonr 3′ 368424 The reactions were performed with linear pMVC control and eitherpDONR5′ or pDONR3′. The numbers tabulated are averaged from threeexperiments and represent the average number of colonies from eachLB-Kan agar plate.

A new LR reaction buffer is required for MultiSite GATEWAY™ reactionsdue to the lowered number of colonies generated when performing thesereactions with the standard 5×LR reaction buffer. As demonstrated by theresults in Table 19, MultiSite GATEWAY™ reactions performed with thestandard LR reaction buffer is only, at best, 4% as efficient as the LR4reaction buffer. One can successfully use the standard LR reactionbuffer for MultiSite GATEWAY™ reactions, but this requires that thetotal molar amount of vectors in the LR assembly reaction to reach 120fmoles. Exceeding 120 fmoles of total plasmids in a MultiSite LRreaction appears to lower efficiency of the LR reaction and generatemis-assembled clones. Therefore, to maintain the 100% cloning fidelityof the MultiSite LR reaction and obtain reasonable colony numbers the LRreaction buffer was optimized.

TABLE 19 LR assembly reactions were performed with either the StandardLR or the LR4 reaction buffer. LR Reaction Standard 5 x LR Buffer 5 xLR4 Buffer Two Fragment (1) 0 11200 Two Fragment (2) 165 3700 TwoFragment (3) 0 9625 Two Fragment (4) 100 5000 Three Fragment (1) 0 3760Three Fragment (2) 0 2172 Three Fragment (3) 0 2200 The number ofcolonies obtained after the transformation into TOP10 cells determinedefficiency of these reactions. The colony counts are reflected as totalnumber of colonies obtained per LR assembly reaction.

To formulate an optimal MultiSite LR reaction buffer, the concentrationsof several buffer components were varied to obtain the optimal bufferconcoction. These components include; Tris, sodium chloride, EDTA,glycerol, bovine serum albumin and spermidine. Varying spermidineconcentration affected the LR reaction most significantly.

The titration of spermidine was assessed with the LR reaction describedabove for the Three Fragment Modular Vector Construction Kit (FIGS. 34and 37); colony counts from this reaction were scored against the finalspermidine concentration in the LR reaction. A broad spermidineconcentration range was initially assessed and these results aredepicted in FIG. 39. From this graph it was decided to focus on theactivity of a MultiSite LR reaction with final spermidine concentrationsbetween 7 mM and 10 mM (FIG. 40).

From the results depicted in FIG. 40 it can be inferred that varyingspermidine concentration in the range of 7.5 mM to 9.5 mM has littleeffect on a MultiSite LR reaction. Therefore, it was decided that afinal spermidine concentration of 8.5 mM would be optimal for aMultiSite LR reaction. The 5×MS LR buffer composition arrived at foroptimal MultiSite LR reactions is:

200 mM Tris-HCl, pH 7.5

5 mM EDTA

40 mM Spermidine

320 mM NaCl

5 mg/ml BSA (Sigma; catalog #A3059)

Exemplary kits useful in the practice of the invention are listed below,and are available from Invitrogen Corporation (Carlsbad, Calif.). Kitsmay comprise one or more of the following nucleic acid molecules:pDONR5′ (which may be called pDONR P4-P1R), pDONR 221, pDest R4R2, pMVCControl. Kits of the invention may comprise one or more containerscontaining one or more buffers, for example, 5×MS LR buffer. Kits of theinvention may be adapted for the construction of desired nucleic acidmolecules comprising portions of three starting nucleic acid molecules(e.g., Three Fragment Modular Vector Construction Kit available fromInvitrogen Corporation, Carlsbad, Calif. catalog no. 12537-023). Suchkits may comprise one or more nucleic acid molecules such as pDONR5′(which may be known as pDONR P4-P1R), pDONR3′ (which may be known aspDONR P2R-P3), pDONR 221, pDest R4R3, and/or pMVC Control. Such kits mayalso comprise one or more containers containing one or more buffers, forexample, 5×MS LR buffer. Kits of the invention may also comprise one ormore containers containing one or more enzymes and/or enzyme-containingmixtures. Suitable enzyme mixtures include, but are not limited to,Clonase™ mixtures such as LR Clonase™ and/or BP Clonase™. Other suitableenzymes include, but are not limited to, Proteinase K. Maps of exemplarynucleic acid molecules suitable for inclusion in kits of the inventionare provided as FIGS. 41A-41E.

Example 22 MultiSite GATEWAY™ BP and LR Recombination Reaction Protocolsfor Experienced Users

This example provides exemplary components and instructions associatedwith kits of the invention.

Perform a BP recombination reaction between each attB-flanked DNAfragment and the appropriate attP-containing donor vector to generate anentry clone.

1. Add the following components to a 1.5 ml microcentrifuge tube at roomtemperature and mix:

attB PCR product (40-100 fmoles) 1-10 μl pDONR ™ vector (supercoiled,150 ng/ml) 2 μl 5 x BP Clonase ™ reaction buffer 4 μl TE Buffer, pH 8.0to 16 μl

2. Vortex BP Clonase™ enzyme mix briefly. Add 4 μl to the componentsabove and mix well by vortexing briefly twice.

3. Incubate reaction at 25° C. for 1 hour.

4. Add 2 μl of 2 μg/μl Proteinase K solution and incubate at 37° C. for10 minutes.

5. Transform 1 μl of the reaction into competent E. coli and select forkanamycin-resistant entry clones.

MultiSite GATEWAY™ LR Recombination Reaction

Perform a MultiSite GATEWAY™ LR recombination reaction between multipleentry clones (attL4-5′ element-attR1+attL1-gene ofinterest-attL2+attR2-3′ element-attL3) and the pDEST™R4-R3 vector togenerate an expression clone (attB4-5′ element-attB1-gene ofinterest-attB2-3′ element-attB3).

1. Add the following components to a 1.5 ml microcentrifuge tube at roomtemperature and mix:

Entry clones (supercoiled, 20-25 fmoles each) 1-11 μl pDEST ™ R4-R3(supercoiled, 60 ng/ml) 1 μl 5 x LR Clonase ™ Plus reaction buffer 4 μlTE Buffer, pH 8.0 to 16 μl

2. Vortex LR Clonase™ Plus enzyme mix briefly. Add 4 μl to thecomponents above and mix well by vortexing briefly twice.

3. Incubate reaction at 25° C. for 16 hours (or overnight).

4. Add 2 μl of 2 μg/μl Proteinase K solution and incubate at 37° C. for10 minutes.

5. Transform 2 μl of the reaction into competent E. coli and select forampicillin-resistant expression clones.

Kit Contents and Storage

All kits and components described in this section and Example areavailable from Invitrogen Corporation (Carlsbad, Calif.), unlessotherwise noted.

Shipping/Storage

The MultiSite GATEWAY™ Three-Fragment Vector Construction Kit is shippedon dry ice in four boxes as described below. Upon receipt, store eachbox as detailed below.

Box Item Storage 1 Vectors −20° C. 2 BP Clonase ™ Enzyme Mix −80° C. 3LR Clonase ™ Plus Enzyme Mix −80° C. 4 One Shot ® TOP10 ChemicallyCompetent E. coli −80° C.

Vectors

The Vectors box (Box 1) contains the following items. Store Box 1 at−20° C.

Item Composition Amount PDONR ™ P4-P1R Lyophilized in TE Buffer, pH 8.06 μg PDONR ™ P2R-P3 Lyophilized in TE Buffer, pH 8.0 6 μg PDONR ™ 221Lyophilized in TE Buffer, pH 8.0 6 μg PDEST ™ R4-R3 Lyophilized in TEBuffer, pH 8.0 6 μg pMS/GW control plasmid Lyophilized in TE Buffer, pH8.0 10 μg 

BP Clonase™ Enzyme Mix

The following reagents are supplied with the BP Clonase™ enzyme mix (Box2). Store Box 2 at −80° C.

Item Composition Amount BP Clonase ™ Enzyme Mix Proprietary 80 μl 5 x BPClonase ™ Proprietary 100 μl Reaction Buffer Proteinase K solution 2μg/μl in: 40 μl 10 mM Tris-HCl, pH 7.5 20 mM CaCl₂ 50% glycerol 30%PEG/Mg solution 30% PEG 8000/30 1 ml mM MgCl₂

LR Clonase™ Plus Enzyme Mix

The following reagents are supplied with the LR Clonase™ Plus enzyme mix(Box 3). Store Box 3 at −80° C.

Item Composition Amount LR Clonase ™ Plus Proprietary 80 μl Enzyme Mix 5x LR Clonase ™ Proprietary 100 μl  Plus Reaction Buffer Proteinase Ksolution 2 μg/μl in: 40 μl 10 mM Tris-HCl, pH 7.5 20 mM CaCl₂ 50%glycerol

One Shot® TOP10 Reagents

The One Shot® TOP10 Chemically Competent E. coli kit (Box 4) containsthe following reagents. Transformation efficiency is 1×10⁹ cfl/μg DNA.Store Box 4 at −80° C.

Item Composition Amount SOC Medium 2% tryptone 6 ml (may be stored atroom 0.5% yeast extract temperature or +4° C.) 10 mM NaCl 2.5 mM KC1 10mM MgCl₂ 10 mM MgSO₄ 20 mM glucose TOP10 chemically — 21 × 50 μlcompetent cells pUC19 Control DNA 10 pg/μl in 5 mM 50 μl Tris-HCl, 0.5mM EDTA, pH 8

Genotype of TOP10

Note that this strain cannot be used for single-strand rescue of DNA.The genotype of the TOP10 cell line is as follows:

F-mcrA δ(mrr-hsdRMS-mcrBC) Φ801acZAM15 δlacX74 recA1 deoR araD139δ(ara-leu)7697 galU galk rpsL (Str^(R)) endA1 nupG

Accessory Products

The products listed in this section may be used with the MultiSiteGATEWAY™ Three-Fragment Vector Construction Kit, available fromInvitrogen Corporation (Carlsbad, Calif. catalog no. 12537-023).

Additional Products

Many of the reagents supplied in the MultiSite GATEWAY™ Three-FragmentVector Construction Kit as well as other products suitable for use withthe kit are available separately from Invitrogen. Ordering informationfor these reagents is provided below.

Item Quantity Catalog no. BP Clonase ™ Enzyme Mix 20 reactions 11789-013LR Clonase ™ Plus Enzyme Mix 20 reactions 12538-013 Library EfficiencyDHα ™ 5 × 0.2 ml 18263-012 Chemically Competent Cells One Shot ® TOP10Chemically 20 × 50 μl C4040-03 Competent E. coli Library EfficiencyDB3.1 ™ 5 × 0.2 ml 11782-018 Competent Cells pDONR ™ 221 6 μg 12536-017M13 Forward (−20) Sequencing 2 μg N520-02 Primer M13 Reverse SequencingPrimer 2 μg N530-02 S.N.A.P. ™ MiniPrep Kit 100 reactions K1900-01S.N.A.P. ™ MidiPrep Kit 20 reactions K1910-01 S.N.A.P. ™ GelPurification Kit 25 reactions K1999-25 Ampicillin  20 ml(10 mg/ml)11593-019 Kanamycin Sulfate 100 ml(10 mg/ml) 15160-054 Platinum ® PfxDNA Polymerase 100 reactions 11708-013 250 reactions 11708-021Platinum ® Taq DNA Polymerase 100 reactions 11304-011 High Fidelity 500reactions 11304-029 Dpn I 100 units 15242-019 React ® 4 Buffer 2 × 1 ml16304-016

GATEWAY™ Entry Vectors

The MultiSite GATEWAY™ Three-Fragment kit provides the pDONR™ 221 vectorto facilitate creation of attL1 and attL2-flanked entry clones.Alternatively, a variety of GATEWAY™ entry vectors are available fromInvitrogen to allow creation of entry clones using TOPO® Cloning orrestriction digestion and ligation.

Item Quantity Catalog no. pENTR/D-TOPO ® Cloning Kit 20 reactionsK2400-20 480 reactions K2400-480 500 reaction K2400-500pENTR/SD/D-TOPO ® Cloning Kit 20 reactions K2420-20 480 reactionsK2420-480 500 reactions K2420-500 pENTR ™ 1A 10 μg 11813-011 pENTR ™ 2B10 μg 11816-014 pENTR ™ 3C 10 μg 11817-012 pENTR ™ 4 10 μg 11818-010pENTR ™ 11 10 μg 11819-018

Overview

Introduction

The MultiSite GATEWAY™ Three-Fragment Vector Construction Kitfacilitates rapid and highly efficient construction of an expressionclone containing your choice of promoter, gene of interest, andtermination or polyadenylation sequence. Other sequences of interest maybe easily substituted or incorporated, providing added flexibility foryour vector construction needs. Based on the GATEWAY™ Technology, theMultiSite GATEWAY™ Technology uses site-specific recombinational cloningto allow simultaneous cloning of multiple DNA fragments in a definedorder and orientation.

The MultiSite GATEWAY™ Three-Fragment Vector Construction Kit isdesigned to help you create a multiple-fragment clone or an expressionclone using the MultiSite GATEWAY™ Technology. Details of the GATEWAY™Technology can be found herein, and in the GATEWAY™ Technology Manual(Invitrogen Corp., Carlsbad, Calif.; Catalog no. 12539-011), which isincorporated by reference herein in its entirety.

This Example provides an overview of the MultiSite Gateway™ Technology,and provides instructions and guidelines to:

1. Design three sets of forward and reverse attB PCR primers, andamplify your three DNA sequences of interest.

2. Perform a BP recombination reaction with each attB PCR product and aspecific donor vector to generate three types of entry clones.

3. Perform a MultiSite Gateway™ LR recombination reaction with yourthree entry clones and the pDEST™R4-R3 destination vector to generate anexpression clone which may then be used in the appropriate applicationor expression system.

Glossary of Terms

To help you understand the terminology used in the MultiSite Gateway™Technology, a glossary of terms is provided below.

The Gateway™ Technology

Gateway™ is a universal cloning technology based on the bacteriophagelambda site-specific recombination system that provides a rapid andhighly efficient way to transfer heterologous DNA sequences intomultiple vector systems for functional analysis and protein expression.

Lambda Recombination Reactions

In lambda, recombination occurs between lambda and the E. colichromosome via specific recombination sequences (att sites), and iscatalyzed by a mixture of recombination proteins (Clonase™ enzyme mix;Invitrogen Corporation, Carlsbad, Calif.). The reactions are describedin the table below.

Pathway Reaction Catalyzed by . . . Lysogenic attB × attP → attL × attRBP Clonase ™ (Int, IHF) Lytic attL × attR → attB × attP LR Clonase ™(Int, Xis, IHF)

Gateway™ Recombination Reactions

The Gateway™ Technology uses modified and optimized att sites to permittransfer of heterologous DNA sequences between vectors. Tworecombination reactions constitute the basis of the Gateway™ Technology:

BP Reaction: Facilitates recombination of an attB substrate (e.g., attBPCR product or expression clone) with an attP substrate (donor vector)to create an attL-containing entry clone (see FIG. 42A). This reactionis catalyzed by BP Clonase™ enzyme mix, a mixture of the λIntegrase(Int) and E. coli Integration Host Factor (IHF) proteins.

LR Reaction: Facilitates recombination of an attL-containing entry clonewith an attR-containing destination vector to create an attB-containingexpression clone (see FIG. 42B). This reaction is catalyzed by LRClonase™ enzyme mix, a mixture of the λ Int and Excisionase (Xis)proteins, and the E. coli IHF protein.

MultiSite Gateway™ Technology

Introduction

The MultiSite Gateway™ Three-Fragment Vector Construction Kit(Invitrogen Corporation; Carlsbad, Calif.) uses modifications of thesite-specific recombination reactions of the Gateway™ Technology toallow simultaneous cloning of three DNA fragments in a defined order andorientation to create your own expression clone. To generate your ownexpression clone, you will:

1. Amplify your three DNA sequences of interest (i.e. 5′ element, geneof interest, and 3′ element) using the recommended attB primers togenerate PCR products that are flanked by attB sites. To ensure thatyour fragments are joined in a specific order, each PCR product must beflanked by specific attB sites.

2. Use the PCR products in separate BP recombination reactions withthree donor vectors (pDONR™P4-P1R, pDONR™221, pDONR™P2R-P3) to generatethree entry clones containing your DNA sequences of interest.

3. Use the three entry clones in a single MultiSite Gateway™ LRrecombination reaction with a specially designed destination vector,pDES™R4-R3, to create your expression clone of interest (see FIG. 43).

Modifications to the att Sites

To permit recombinational cloning using the Gateway™ Technology, thewild-type 1 att sites have been modified to improve the efficiency andspecificity of the Gateway™ BP and LR recombination reactions (see theGateway™ Technology manual for details).

In MultiSite Gateway™, the att sites have been optimized further toaccommodate simultaneous, recombinational cloning of multiple DNAfragments. These modifications include alterations to both the sequenceand length of the att sites, resulting in the creation of “new” attsites exhibiting enhanced specificities and the improved efficiencyrequired to clone multiple DNA fragments at one time. In the MultiSiteGateway™ Three-Fragment kit, four att sites are used versus two attsites in the standard Gateway™ Technology.

For example, four attB sites are used in the MultiSite Gateway™Three-Fragment kit (see table below). Various combinations of these attBsites will flank each PCR product containing your DNA fragment ofinterest.

MultiSite Gateway ™ Gateway ™ attB1 attB1 attB2 attB2 attB3 attB4

Specificity of the Modified att Sites

In general, the modified att sites in the MultiSite Gateway™ Technologydemonstrate the same specificity as in the Gateway™ Technology. That is:

attB sites react only with attP sites; similarly attB1 sites react onlywith attP1 sites to generate attL1 sites

attL sites react only with attR sites; similarly attL1 sites react onlywith attR1 sites to generate attB1 sites

However, depending on the orientation and position of the attB site andattP site in relation to the DNA fragment of interest or the donorvector, respectively, performing the BP recombination reaction canresult in creation of an attR site instead of an attL site.Specifically:

attB1 sites react with attP1R sites to generate attR1 sites

attB2 sites react with attP2R sites to generate attR2 sites

In this example, an attB4 and attB1-flanked PCR product is used in a BPrecombination reaction with pDONR™P4-P R.

attB4-PCR product-attB1×pDONR™P4-P1 R attL4-PCR product-attR1

Because of the orientation and position of the attB1 and attP1R site inthe PCR product and donor vector, respectively, the resulting entryclone contains the PCR product flanked by an attL4 site and an attR1site rather than two attL sites.

MultiSite Gateway™ Donor Vectors

The MultiSite Gateway™ donor vectors are used to clone attB-flanked PCRproducts to generate entry clones, and contain similar elements as otherGateway™ donor vectors. However, because your PCR products will beflanked by different attB sites, three different donor vectors arerequired to facilitate generation of the three types of entry clonesrequired for MultiSite Gateway™:

pDONR™P4-P1R: Use to clone attB4 and attB1-flanked PCR products.

pDONR™221: Use to clone attB1 and attB2-flanked PCR products.

pDONR™P2R-P3: Use to clone attB2 and attB3-flanked PCR products.

For a map and a description of the features of each pDONR™ vector, seebelow and FIGS. 53-55.

While pDONR™221 is well suited for use in Gateway™ reactions, thepDONR™P4-P1R and pDONR™P2R-P3 vectors are designed for use in MultiSiteGateway™ applications.

MultiSite Gateway™ Destination Vector

The MultiSite Gateway™ destination vector, pDEST™R4-R3, is designed foruse in the MultiSite Gateway™ three-fragment LR recombination reactionwith the three entry clones described above. The pDEST™R4-R3 vectorcontains attR4 and attR3 sites flanking a selection cassette and allowsgeneration of the expression clone of interest. Note that other Gateway™destination vectors are not typically suitable for use in the MultiSiteGateway™ LR reaction.

For a map and a description of the features of the pDEST™R4-R3 vector,see FIGS. 41D and 56.

LR Clonase™ Plus Enzyme Mix

The MultiSite Gateway™ LR recombination reaction is catalyzed by anoptimized LR Clonase™, LR Clonase™ Plus enzyme mix. LR Clonase™(Invitrogen Corporation, Carlsbad, Calif., catalog no. 12538-013). Plusenzyme mix facilitates efficient recombinational cloning of multiple DNAfragments, but may also be used in the standard Gateway™ LRrecombination reaction. Note that LR Clonase™ enzyme mix is not wellsuited for use in the MultiSite Gateway™ LR recombination reaction.

MultiSite Gateway™ BP Recombination Reactions

Introduction

The MultiSite Gateway™ BP recombination reaction facilitates productionof entry clones from your three attB-flanked PCR products. Since eachPCR product is flanked by a specific combination of attB sites, specificdonor vectors must also be used. An illustration of each BPrecombination reaction is provided in this section.

Note that the att sites used in MultiSite Gateway™ have been optimizedto improve specificity and efficiency of the MultiSite Gateway™ LRrecombination reaction, and may vary in size and sequence from thoseused in the Gateway™ Technology.

attB 5′ Element×pDONR™P4-P1R Recombination Region

The diagram in FIG. 44 depicts the recombination reaction between theattB4 and attB1-flanked PCR product (i.e. attB 5′ element) andpDONR™P4-P1R to create an entry clone and a by-product.

Features of the Recombination Region:

Shaded regions in FIG. 44 correspond to those sequences transferred fromthe attB 5′ element into the entry clone following recombination. Notethat the 5′ element in the entry clone is flanked by attL4 and attR1sites.

Boxed regions in FIG. 44 correspond to those sequences transferred fromthe donor vector into the by-product following recombination.

attB Gene×pDONR™221 Recombination Region

FIG. 45 depicts the recombination reaction between the attB1 andattB2-flanked PCR product (i.e., attB gene) and pDONR™221 to create anentry clone and a by-product.

Features of the Recombination Region

Shaded regions in FIG. 45 correspond to those sequences transferred fromthe attB PCR product into the entry clone following recombination. Notethat the PCR product in the entry clone is flanked by attL1 and attL2sites, and is suitable for use for all standard Gateway™ applications.

Boxed regions in FIG. 45 correspond to those sequences transferred fromthe donor vector into the by-product following recombination.

attB 3′ Element×pDONR™P2R-P3 Recombination Region

FIG. 46 depicts the recombination reaction between the attB2 andattB3-flanked PCR product (i.e. attB 3′ element) and pDONR™P2R-P3 tocreate an entry clone and a by-product.

Features of the Recombination Region

Shaded regions in FIG. 46 correspond to those sequences transferred fromthe attB 3′ element into the entry clone following recombination. Notethat the 3′ element in the entry clone is flanked by attR2 and attL3sites.

Boxed regions in FIG. 46 correspond to those sequences transferred fromthe donor vector into the by-product following recombination.

Features of the MultiSite Gateway™ Vectors

MultiSite Gateway™ Vectors

Two types of MultiSite Gateway™-adapted vectors are available fromInvitrogen:

Gateway ™ Vector Characteristics Donor vector Contains attP sites(pDONR ™) Used to clone attB-flanked PCR products to generate entryclones Destination Contains attR sites vector Recombines with multipleentry clones in a MultiSite Gateway ™ LR reaction to generate anexpression clone

Common Features of the MultiSite Gateway™ Vectors

To enable recombinational cloning and efficient selection of entry orexpression clones, the MultiSite Gateway™ donor and destination vectorscontain two att sites flanking a cassette containing:

The ccdB gene (see below) for negative selection

Chloramphenicol resistance gene (Cm^(R)) for counterselection

After a BP or MultiSite Gateway™ LR recombination reaction, thiscassette is replaced by the gene of interest to generate the entry cloneand expression clone, respectively.

ccdB Gene

The presence of the ccdB gene allows negative selection of the donor anddestination vectors in E. coli following recombination andtransformation. The ccdB protein interferes with E. coli DNA gyrase,thereby inhibiting growth of most E. coli strains (e.g., TOP10, DH5α™).When recombination occurs (i.e. between a destination vector and anentry clone or between a donor vector and an attB PCR product), the ccdbgene is replaced by the gene of interest. Cells that take up unreactedvectors carrying the ccdB gene or by-product molecules retaining theccdB gene will fail to grow. This allows high-efficiency recovery of thedesired clones.

Methods

Propagating the MultiSite Gateway™ Vectors

The MultiSite Gateway™ Three-Fragment Vector Construction Kit includesthe following vectors. See the guidelines below to propagate andmaintain these vectors.

Donor Vectors:

-   -   pDONR™ P4-P1R    -   pDONR™ 221    -   pDONR™ P2R-P3

Destination Vector:

-   -   pDEST™ R4-R3

Control Vector:

-   -   pMS/GW

Propagating Donor and Destination Vectors

The pDONR™P4-P1R, pDONR™221, pDONR™P2R-P3, and pDEST™R4-R3 vectorscontain the ccdB gene and must be propagated in E. coli strains that areresistant to ccdB effects. To propagate and maintain the vectors, werecommend using the DB3.1™ E. coli strain which contains a gyrasemutation (gyrA462) that renders it resistant to the ccdB effects(Bernard and Couturier, 1992; Bernard et al., 1993; Miki et al., 1992).Library Efficiency® DB3.1™ Competent Cells are available from Invitrogen(Catalog no. 11782-018) for transformation. To maintain the integrity ofthe vector, select for transformants in media containing 50 μg/mlkanamycin and 15-30 μg/ml chloramphenicol.

Note: DO NOT use general E. coli cloning strains including TOP10 orDH5α™ for propagation and maintenance as these strains are sensitive toccdB effects.

Genotype of DB3.1

Host cell strain E. coli DB3.1 (Invitrogen Corporation; Carlsbad,Calif.) has the following genotype: F⁻ gyrA462 endA1 δ(sr1-recA) mcrBmrr hsdS20(r_(B) ⁻, m_(B) ⁻) supE44 ara14 galK2 lacY1 proA2 rpsL20(Smr)xyl5 δleu mil1.

pMS/GW Vector

To propagate and maintain the pMS/GW plasmid, you may use any recA, endAE. coli strain including TOP10, DH5α, or DH10B for transformation. OneShot® TOP10Chemically Competent E. coli, included with the kit fortransformation, are recommend for use. Select for transformants in mediacontaining 50-100 μg/ml ampicillin.

Types of Entry Clones

To use the MultiSite Gateway™ Three-Fragment kit to construct your ownexpression clone, you will create 3 types of entry clones, then usethese entry clones in a MultiSite Gateway™ LR recombination reactionwith a MultiSite Gateway™ destination vector to generate your expressionclone. For proper expression of the gene of interest, these entry clonesshould, at a minimum, contain the sequences described below. Note:Depending on your needs or application of interest, other sequences arepossible.

An attL4 and attR1-flanked entry clone containing your 5′ element ofinterest. The 5′ element typically contains promoter sequences requiredto control expression of your gene of interest. Other additionalsequences including an N-terminal fusion tag may be added.

An attL1 and attL2-flanked entry clone containing your DNA fragment ofinterest. This DNA fragment generally encodes the gene of interest. Toobtain proper expression in the system of choice, remember to includesequences necessary for efficient translation initiation (i.e.,Shine-Dalgamo, Kozak consensus sequence, yeast consensus sequence).

An attR2 and attL3-flanked entry clone containing your 3′ element ofinterest. The 3′ element typically contains transcription terminationsequences or polyadenylation sequences required for efficienttranscription termination and polyadenylation of mRNA. Other additionalsequences including a C-terminal fusion tag may be added.

For more information about how to generate each type of entry clone, seebelow.

Important

If you construct an expression clone containing the elements describedabove (i.e., promoter of choice+gene of interest+termination orpolyadenylation sequence of choice), remember that this expression clonewill be expressed transiently in mammalian, yeast, and insect systems,but may be expressed stably in prokaryotic systems. To perform stableexpression studies in mammalian, yeast, or insect systems, include aresistance marker in one of the entry clones (generally the attR2 andattL3-flanked entry clone).

Generating attL4 and attR1-Flanked Entry Clones

To generate an attL4 and attR1-flanked entry clone containing your 5′element of interest:

1. Design appropriate PCR primers and produce your attB4 andattbB-flanked PCR product.

2. Perform a BP recombination reaction between the attB4 andattB1-flanked PCR product and pDONR™P4-P1R to generate the entry clone(see FIG. 47A).

Generating attR2 and attL3-Flanked Entry Clones

To generate an attR2 and attL3-flanked entry clone containing your 3′element of interest:

1. Design appropriate PCR primers and produce your attB2 andattB3-flanked PCR product.

2. Perform a BP recombination reaction between the attB2 andattB3-flanked PCR product and pDONR™P2R-P3 to generate the entry clone(see FIG. 47B).

Generating attL1 and attL2-Flanked Entry Clones

The attL1 and attL2-flanked entry clone contains your gene of interestand can be used with both MultiSite Gateway™ and traditional Gateway™applications. This entry clone may be generated using a variety ofmethods (see FIG. 48).

1. Generate a PCR product containing attB1 and attB2 sites and use thisattB PCR product in a BP recombination reaction with the pDONR™221vector. To use this method, refer to the guidelines and instructionsprovided in this manual.

2. Clone a PCR product or a restriction enzyme fragment into an entry(PENTR™) vector (see the next page for more information).

3. Generate or obtain a cDNA library cloned into a Gateway™-compatiblevector (i.e. attB-containing pCMV SPORT6 or pEXP-AD502 vectors), and usethe cDNA clones in a BP recombination reaction with the pDONR™221 vector(see the Gateway™ Technology manual for more information).

Entry Vectors

Many entry vectors are available from Invitrogen to facilitategeneration of entry clones. The pENTR/D-TOPO® and pENTR/SD/D-TOPO®vectors allow rapid TOPO® Cloning of PCR products while the pENTR™vectors allow ligase-mediated cloning of restriction enzyme fragments.All entry vectors include:

attL1 and attL2 sites to allow recombinational cloning of the gene ofinterest with a destination vector to produce an expression clone.

A Kozak consensus sequence for efficient translation initiation ineukaryotic cells. Some entry vectors include a Shine-Dalgamo sequencefor initiation in E. coli (see table below).

Kanamycin resistance gene for selection of plasmid in E. coli.

pUC origin for high-copy replication and maintenance of the plasmid inE. coli.

Entry Vector Kozak Shine-Dalgarno Catalog no. pENTR/D-TOPO ® X K2400-20pENTR/SD/D-TOPO ® X X K2420-20 pENTR ™ 1A X X 11813-011 pENTR ™ 2B X11816-014 pENTR ™ 3C X X 11817-012 pENTR ™ 4 X 11818-010 pENTR ™ 11 X X11819-018

Constructing Entry Clones

To construct an entry clone using one of the pENTR™ vectors, refer toinformation provided herein for the specific entry vector you are using.

Designing attB PCR Primers

To generate PCR products suitable for use as substrates in a Gateway™ BPrecombination reaction with a donor vector, you will need to incorporateattB sites into your PCR products. To facilitate use in MultiSiteGateway™, each PCR product must be flanked by a different combination ofattB sites (see table below). Guidelines are provided below to help youdesign appropriate PCR primers.

DNA Sequence of Interest Forward PCR Primer Reverse PCR Primer 5′element attB4 attB1 Gene of interest attB1 attB2 3′ element attB2 attB3

Designing Your PCR Primers

The design of the PCR primers to amplify your DNA sequences of interestis critical for recombinational cloning using MultiSite Gateway™Technology. Consider the following when designing your PCR primers:

Sequences required to facilitate MultiSite Gateway™ cloning.

Sequences required for efficient expression of the protein of interest(i.e., promoter sequences, termination or polyadenylation sequences,Shine-Dalgarno or Kozak consensus sequences).

Whether or not you wish your PCR product(s) to be fused in frame withany N- or C-terminal fusion tags. Note that sequences encoding the tagare generally incorporated into your PCR product as part of the 5′ or 3′element.

Guidelines to Design the Forward PCR Primer

When designing the appropriate forward PCR primer, consider the pointsbelow. See FIG. 49.

To enable efficient MultiSite Gateway™ cloning, the forward primer maycontain the following structure:

1. Four guanine (G) residues at the 5′ end followed by

2. The 22 or 25 bp attB site followed by

3. At least 18-25 bp of template- or gene-specific sequences

Note: If you plan to express native protein in E. coli or mammaliancells, you may want to include a Shine-Dalgamo or Kozak consensussequence, respectively, in the attB1 forward PCR primer.

The attB4 and attB2 sites end with a guanine (G), and the attB1 sitewith a thymine (T). If you wish to fuse your PCR product in frame withan N- or C-terminal tag (as appropriate), the primer must include twoadditional nucleotides to maintain the proper reading frame. Note thatthe two additional nucleotides in the attB1 primer cannot be AA, AG, orGA because these additions will create a translation termination codon.

Guidelines to Design the Reverse PCR Primer

When designing your reverse PCR primer, consider the points below. SeeFIG. 50.

To enable efficient MultiSite Gateway™ cloning, the reverse primer maycontain the following structure:

1. Four guanine (G) residues at the 5′ end followed by

2. The 22 or 25 bp attB site followed by

3. 18-25 bp of template- or gene-specific sequences

If you wish to fuse your PCR product in frame with an N- or C-terminaltag the attB 1 and attB2 reverse primers should include one additionalnucleotide to maintain the proper reading frame (see FIG. 50).

Any in-frame stop codons between the attB sites and your gene ofinterest may be removed.

If you do not wish to fuse your PCR product in frame with a C-terminaltag, your gene of interest or the attB2 primer may include a stop codon.

Important

50 nmol of standard purity, desalted oligonucleotides is sufficient formost applications.

Dissolve oligonucleotides to 20-50 mM in water or TE Buffer and verifythe concentration before use.

For more efficient cloning of large PCR products (greater than 5 kb), werecommend using HPLC or PAGE-purified oligonucleotides.

Producing attB PCR Products

DNA Templates

The following DNA templates can be used for amplification withattB-containing PCR primers:

Genomic DNA

mRNA

cDNA libraries

Plasmids containing cloned DNA sequences

Recommended Polymerases

We recommend using the following DNA polymerases available fromInvitrogen to produce your attB PCR products. Other DNA polymerases aresuitable.

To generate PCR products less than 5-6 kb for use in protein expression,use Platinum® Pfx DNA Polymerase (Invitrogen; Catalog no. 11708-013).

To generate PCR products for use in other applications (e.g., functionalanalysis), use Platinum® Taq DNA Polymerase High Fidelity (Invitrogen;Catalog no. 11304-011).

Producing PCR Products

Standard PCR conditions can be used to prepare attB PCR products. Followthe manufacturer's instructions for the DNA polymerase you are using,and use the cycling parameters suitable for your primers and template.Note: In general, attB sequences do not affect PCR product yield orspecificity.

Checking the PCR Product

Remove 1-2 μl from each PCR reaction and use agarose gel electrophoresisto verify the quality and yield of your PCR product. If the PCR productis of the appropriate quality and quantity, proceed to Purifying attBPCR Products, next section.

If your PCR template is a plasmid that contains the kanamycin resistancegene, we suggest treating your PCR reaction mixture with Dpn I beforepurifying the attB PCR product. This treatment degrades the plasmid(i.e., Dpn I recognizes methylated GATC sites) and helps to reducebackground in the BP recombina-tion reaction associated with templatecontamination.

Materials Needed

10×REact 4 Buffer (Invitrogen, Catalog no. 16304-016)

Dpn I (Invitrogen, Catalog no. 15242-019)

Protocol

1. To your 50 μl PCR reaction mixture, add 5 μl of 10×REact 4 Buffer and5 units of Dpn I.

2. Incubate at 37° C. for 15 minutes.

3. Heat-inactivate the Dpn I at 65° C. for 15 minutes.

4. Proceed to Purifying attB PCR Products.

Purifying attB PCR Products

After you have generated your attB PCR products, we recommend purifyingeach PCR product to remove attB primers and any attB primer-dimers.Primers and primer-dimers can recombine efficiently with the donorvector in the BP reaction and may increase background aftertransformation into E. coli. A protocol is provided below to purify yourPCR products.

Important

Standard PCR product purification protocols using phenol/chloroformextraction followed by sodium acetate and ethanol or isopropanolprecipitation are not recommended for use in purifying attB PCRproducts. These protocols generally have exclusion limits of less than100 bp and do not efficiently remove large primer-dimer products.

Materials Needed

You should have the following materials on hand before beginning:

Each attB PCR product (in a 50 μl volume)

TE Buffer, pH 8.0 (10 mM Tris-HCl, pH 8.0, 1 mM EDTA)

30% PEG 8000/30 mM MgCl₂ (supplied with the kit, Box 2)

Agarose gel of the appropriate percentage to resolve your attB PCRproducts

PEG Purification Protocol

Use the protocol below to purify attB PCR products. Note that thisprocedure removes DNA less than 300 bp in size.

1. Add 150 μl of TE, pH 8.0 to a 50 μl amplification reaction containingyour attB PCR product.

2. Add 100 μl of 30% PEG 8000/30 mM MgCl₂. Vortex to mix thoroughly andcentrifuge immediately at 10,000×g for 15 minutes at room temperature.

Note: In most cases, centrifugation at 10,000×g for 15 minutes resultsin efficient recovery of PCR products. To increase the amount of PCRproduct recovered, the centrifugation time may be extended or the speedof centrifugation increased.

3. Carefully remove the supernatant. The pellet will be clear and nearlyinvisible.

4. Dissolve the pellet in 50 μl of TE, pH 8.0 (to concentration>10ng/μl).

5. Check the quality and quantity of the recovered attB PCR product onan agarose gel.

6. If the PCR product is suitably purified, proceed to Creating EntryClones Using the BP Recombination Reaction. If the PCR product is notsuitably purified (e.g., attB primer-dimers are still detectable), seebelow.

Additional Purification

If you use the procedure above and your attB PCR product is not suitablypurified, you may gel purify your attB PCR product. We recommend usingthe S.N.A.P.™ Gel Purification Kit available from Invitrogen (Catalogno. K1999-25).

Creating Entry Clones Using the BP Recombination Reaction

Once you have generated your attB PCR products, you will perform a BPreaction to transfer the DNA sequence of interest into anattP-containing donor vector to create an entry clone. To ensure thatyou obtain the best possible results, we suggest that you read thissection and the ones entitled Performing the BP Recombination Reactionand Transforming One Shot(V TOP10 Competent Cells before beginning.Choosing a Donor Vector

Since each attB PCR product is flanked by different attB sites, aspecific donor vector is required for each BP recombination reaction.Refer to the table below to determine which donor vector to use in theBP recombination reaction. See FIGS. 51A-51C for an illustration of therecombination region of each entry clone after the BP reaction.

If your PCR product contains . . . Then use . . . attB4-PCRproduct-attB1 PDONR ™P4-P1R attB1-PCR product-attB2 PDONR ™221 attB2-PCRproduct-attB3 PDONR ™P2R-P3

Experimental Outline

To generate an entry clone, you will:

1. Perform a BP recombination reaction using the appropriate linear attBPCR product and a supercoiled, attP-containing donor vector (see above).

2. Transform the reaction mixture into a suitable E. coli host.

3. Select for entry clones.

Important

For optimal results, perform the BP recombination reaction using:

Linear attB PCR products

Supercoiled donor vector

Donor Vectors

The pDONR™P4-P1R, pDONR™221, and pDONR™P2R-P3 vectors are supplied withthe kit to facilitate generation of entry clones using the BPrecombination reaction. The donor vectors contain the followingelements:

Two attP sites for recombinational cloning of attB-containing PCRproducts

The ccdB gene located between the attP sites for negative selection

The chloramphenicol resistance gene (Cm^(R)) located between the twoattP sites for counterselection

M13 forward (−20) and M13 reverse primer binding sites to facilitatesequencing of the entry clone, if desired

pUC origin for high-copy replication and maintenance of the plasmid inE. coli

Kanamycin resistance gene for selection of the plasmid in E. coli

For a map and a description of the features of each donor vector, seethe Appendix.

Resuspending the Donor Vectors

All donor vectors are supplied as 6 μg of supercoiled plasmid,lyophilized in TE Buffer, pH 8.0. To use, resuspend the pDONR™ plasmidDNA in 40 μl of sterile water to a final concentration of 150 ng/μl.

Recombination Region of the attL4 and attR1-Flanked Entry Clone

The recombination region of the entry clone resulting frompDONR™P4-P1R×attB4-5′ element-attB1 is shown in FIG. 51A.

Features of the Recombination Region:

Shaded regions in FIG. 51A correspond to those DNA sequences transferredfrom the attB PCR product into the pDONR™P4-P1R vector by recombination.Non-shaded regions are derived from the pDONR™P4-P1R vector.

Bases 674 and 2830 of the pDONR™P4-P1R sequence are marked.

Recombination Region of the attL1 and attL2-Flanked Entry Clone

The recombination region of the entry clone resulting frompDONR™221×attB1-gene of interest-attB2 is shown in FIG. 51B.

Features of the Recombination Region:

Shaded regions in FIG. 51B correspond to those DNA sequences transferredfrom the attB PCR product into the pDONR™221 vector by recombination.Non-shaded regions are derived from the pDONR™221 vector.

Bases 651 and 2894 of the pDONR™221 sequence are marked.

Recombination Region of the attR2 and attL3-Flanked Entry Clone

The recombination region of the entry clone resulting frompDONR™P2R-P3×attB2-3′ element-attB3 is shown in FIG. 51C.

Features of the Recombination Region:

Shaded regions in FIG. 51C correspond to those DNA sequences transferredfrom the attB PCR product into the pDONR™P2R-P3 vector by recombination.Non-shaded regions are derived from the pDONR™P2R-P3 vector.

Bases 733 and 2889 of the pDONR™P2R-P3 sequence are marked.

Performing the BP Recombination Reaction

General guidelines and instructions are provided below and in the nextsection to perform a BP recombination reaction using the appropriateattB PCR product and donor vector, and to transform the reaction mixtureinto a suitable E. coli host to select for entry clones. We recommendincluding a positive control (see below) and a negative control (no attBPCR product) in your experiment to help you evaluate your results.

Positive Control

pMS/GW is included with the MultiSite Gateway™ Three-Fragment VectorConstruction Kit for use as a positive control for each BP reaction, andcontains multiple DNA fragments that have been joined using MultiSiteGateway™ Technology.

The pMS/GW plasmid is supplied as 10 μg of supercoiled plasmid,lyophilized in TE Buffer, pH 8.0. To use, resuspend the pMS/GW plasmidDNA in 10 ml of sterile water to a final concentration of 1 μg/μl. Topropagate the plasmid, see infra.

Linearizing the Positive Control

You will need to linearize the pMS/GW plasmid before it may be used as acontrol for each BP reaction. We recommend linearizing the vector byrestriction digest using Aat II (New England Biolabs, Catalog no.R0117S).

1. Digest 5 μg of pMS/GW plasmid in a 50 μl reaction using Aat II.Follow the manufacturer's instructions.

2. Heat-inactivate the Aat II at 70° C. for 1 hour.

3. Proceed to Setting Up the BP Reaction. Note that the concentration ofthe digested DNA is 100 ng/μl.

Determining How Much attB PCR Product and Donor Vector to Use in theReaction

For optimal efficiency, we recommend using the following amounts of attBPCR product and donor vector in a 20 μl BP recombination reaction:

An equimolar amount of attB PCR product and the donor vector

100 femtomoles (fmol) each of attB PCR product and donor vector ispreferred, but the amount of attB PCR product used may range from 40-100fmol

Note: 100 fmol of donor vector (pDONR™P4-P1R, pDONR™221, orpDONR™P2R-P3) is approximately 300 ng

For large PCR products (>4 kb), use at least 100 fmol of attB PCRproduct, but no more than 500 ng

Caution

Do not use more than 500 ng of donor vector in a 20 μl BP reaction asthis will affect the efficiency of the reaction.

Do not exceed more than 1 μg of total DNA (donor vector plus attB PCRproduct) in a 20 μl BP reaction as excess DNA will inhibit the reaction.

Converting Femto-Moles (fmol) to Nanograms (ng)

Use the following formula to convert femtomoles (fmol) of DNA tonanograms

(ng) of DNA:

1 ng=(x fmol) (N) (660 fg/fmol) (1 ng 10/6 fg)

where x is the number of fmoles and N is the size of the DNA in bp.

Materials Needed

You should have the following materials on hand before beginning.

Supplied with the kit:

pDONR™ vectors (i.e., pDONR™P4-P1R, pDONR™221, and

pDONR™P2R-P3; resuspend each vector to 150 ng/μl with water)

BP Clonase™ enzyme mix (keep at −80° C. until immediately before use)

5×BP Clonase™ Reaction Buffer (thaw and keep on ice until use)

2 μg/μl Proteinase K solution (thaw and keep on ice until use)

pMS/GW positive control (linearize before use; 100 ng/μl)

Supplied by the User:

attB PCR products (i.e., attB4-PCR product-attB1, attB1-PCRproduct-attB2, or attB2-PCR product-attB3; see the previous page andabove to determine the amount of DNA to use)

TE Buffer, pH 8.0 (10 mM Tris-HCl, pH 8.0, 1 mM EDTA)

Setting Up the BP Reaction

1. For each BP recombination reaction between an appropriate attB PCRproduct and donor vector, add the following components to 1.5 mlmicrocentriftige tubes at room temperature and mix.

Negative Positive Components Sample Control Control attB PCR product(40-100 fmol) 1-10 μl — — PDONR ™ vector (150 ng/μl) 2 μl 2 μl 2 μlPMS/GW positive control (100 ng/μl) — — 4 μl 5 x BP Clonase ™ ReactionBuffer 4 μl 4 μl 4 μl TE Buffer, pH 8.0 to 16 μl 10 μl 6 μl

2. Remove the BP Clonase™ enzyme mix from −80° C. and thaw on ice(.about.2 minutes).

3. Vortex the BP Clonase™ enzyme mix briefly twice (2 seconds eachtime).

4. To each sample above, add 4 ml of BP Clonase™ enzyme mix. Mix well byvortexing briefly twice (2 seconds each time).

Reminder: Return BP Clonase™ enzyme mix to −80° C. immediately afteruse.

5. Incubate reactions at 25° C. for 1 hour.

Note: A 1 hour incubation generally yields a sufficient number of entryclones. Depending on your needs, the length of the recombinationreaction can be extended up to 18 hours. An overnight incubationtypically yields 5-10 times more colonies than a 1 hour incubation. Forlarge PCR products (5 kb), longer incubations (i.e., overnightincubation) will increase the yield of colonies and are recommended.

6. Add 2 ml of the Proteinase K solution to each reaction. Incubate for10 minutes at 37° C.

7. Proceed to Transforming One Shot® TOP10 Competent Cells, next page.

Note: You may store the BP reaction at −20° C. for up to 1 week beforetransformation, if desired.

Transforming One Shot® TOP10 Competent Cells

Use the guidelines and procedures provided in this section to transformcompetent E. coli with the BP recombination reaction or the MultiSiteGateway™ LR recombination reaction to select for entry clones orexpression clones, respectively. One Shot® TOP10 chemically competent E.coli (Box 4) are included with the kit for use in transformation,however, you may also transform electrocompetent cells. Instructions totransform chemically competent or electrocompetent E. coli are providedin this section.

Note:

You may use any recA, endA E. coli strain including TOP10 (supplied withthe kit), DH5α™, DH10B™ or equivalent for transformation. Other strainsare suitable. Do not use E. coli strains that contain the F′ episome(e.g., TOP10F′) for transformation. These strains contain the ccdA geneand will prevent negative selection with the ccdB gene.

For your convenience, TOP10, DH5α™, and DH10B™ E. coli are availableseparately from Invitrogen as chemically competent or electrocompetentcells (see table below).

Item Quantity Catalog No. Library Efficiency ® DH5α ™ 5 × 200 μl18263-012 One Shot ® TOP10 Chemically 20 × 50 μl C4040-03 Competent E.coli One Shot ® Max Efficiency ® 20 × 50 μl 12331-013 DH10B ™ T1 PhageResistant Chemically Competent E. coli One Shot ® TOP10 20 × 50 μlC4040-52 Electrocomp E. coli ElectroMax ™ DH10B ™ 5 × 100 μl 18290-015

Materials Needed

You should have the following materials on hand before beginning.

Supplied with the kit:

One Shot® TOP10 chemically competent E. coli (thaw on ice 1 vial of OneShots TOP 10 cells for each transformation)

SOC medium (warm to room temperature)

Positive control (e.g., pUC19; use as a control for transformation ifdesired)

Supplied by the User:

BP recombination reaction (from Setting Up the BP Reaction, Step 7,previous page) or MultiSite Gateway™ LR recombination reaction.

LB plates containing 50 μg/ml kanamycin (for the BP reaction) or 50-100μg/ml ampicillin (for the MultiSite Gateway™ LR reaction). Prepare twoplates for each transformation; warm at 37° C. for 30 minutes.

42° C. water bath (for chemical transformation)

37° C. shaking and non-shaking incubator

One Shot® TOP10 Chemical Transformation Protocol

1. Into a vial of One Shot® TOP10 chemically competent E. coli, add thefollowing and mix gently. Do not mix by pipetting up and down.

Add 1 μl of the BP recombination reaction or

Add 2 μl of the MultiSite Gateway™ LR recombination reaction. Note: Youmay transform up to 5 μl of the reaction, if desired.

Reminder: If you are including the transformation control, add 1 μl (10pg) of pUC19.

2. Incubate on ice for 5 to 30 minutes.

3. Heat-shock the cells for 30 seconds at 42° C. without shaking.

4. Immediately transfer the tubes to ice.

5. Add 250 μl of room temperature SOC medium.

6. Cap the tube tightly and shake the tube horizontally (200 rpm) at 37°C. for 1 hour.

7. Spread the following amount from each transformation on a prewarmedselective plate and incubate overnight at 37° C. We generally plate 2different volumes to ensure that at least 1 plate has well-spacedcolonies.

BP recombination reaction: spread 20 μl and 100 μl

MultiSite Gateway™ LR recombination reaction: spread 50 μl and 100 μl

What You Should See

BP reaction: An efficient BP recombination reaction may produce hundredsof colonies (approximately 3,000 colonies if the entire transformationis plated).

MultiSite Gateway™ LR reaction: An efficient MultiSite Gateway™ LRrecombination reaction may produce approximately 100 colonies(approximately 2,000 to 8,000 if the entire transformation is plated).

Transformation by Electroporation

Use only electrocompetent cells for electroporation to avoid arcing. Donot use the One Shot® TOP10 chemically competent cells forelectroporation.

1. Into a 0.1 cuvette containing 50 μl of electrocompetent E. coli, addthe following and mix gently. Do not mix by pipetting up and down. Avoidformation of bubbles.

1 μl of the BP recombination reaction or

2 μl of the MultiSite Gateway™ LR recombination reaction.

2. Electroporate your samples using an electroporator and themanufacturer's suggested protocol.

Note: If you have problems with arcing, see below.

3. Immediately add 450 μl of room temperature SOC medium.

4. Transfer the solution to a 15 ml snap-cap tube (i.e., Falcon) andshake for at least 1 hour at 37° C. to allow expression of theantibiotic resistance marker.

5. Spread 50-100 μl from each transformation on a prewarmed selectiveplate and incubate overnight at 37° C. We recommend plating 2 differentvolumes to ensure that at least 1 plate has well-spaced colonies.

6. An efficient recombination reaction may produce several hundredcolonies.

To prevent arcing of your samples during electroporation, the volume ofcells should be between 50 and 80 μl (0.1 cm cuvettes) or 100 to 200 μl(0.2 cm cuvettes).

If you experience arcing during transformation, try one of thefollowing:

Reduce the voltage normally used to charge your electroporator by 10%

Reduce the pulse length by reducing the load resistance to 100 ohms

Dilute the BP reaction 5-10 fold with sterile water, then transform 1 μlinto cells

Sequencing Entry Clones

You may sequence entry clones generated by BP recombination usingdye-labeled terminator chemistries including DYEnamic™ energy transferor BigDye™ reaction chemistries.

Sequencing Primers

To sequence entry clones derived from BP recombination withpDONR™P4-P1R, pDONR™221, and pDONR™P2R-P3, we recommend using thefollowing sequencing primers:

Forward primer M13 Forward (−20): (SEQ ID NO: 154)5′-GTAAAACGACGGCCAG-3′ Reverse primer M13 Reverse: (SEQ ID NO: 155)5′-CAGGAAACAGCTATGAC-3′

The M13 Forward (−20) and M13 Reverse Primers (Catalog nos. N520-02 andN530-O₂, respectively) are available separately from Invitrogen. Formore information, see our Web site (www.invitrogen.com) or callTechnical Service.

Sequencing Using BigDye™ Chemistry

To sequence entry clones using the BigDye™ chemistry, we recommend thefollowing:

Use at least 500 ng of DNA

Use 5-50 pmoles of primers

Use ¼ reaction and the PCR conditions listed below

PCR Conditions

Use the following PCR conditions for sequencing using BigDye™ chemistry.These conditions are suitable for most inserts, including small inserts.

Step Time Temperature Cycles Initial Denaturation 5 minutes 95° C.  1 xDenaturation 10-30 seconds 96° C. 30 x Annealing 5-15 seconds 50° C.Extension 4 minutes 60° C.

BigDye™ is a registered trademark of Applied Biosystems

Creating Expression Clones Using the MultiSite Gateway™ LR RecombinationReaction

After you have generated entry clones containing your 5′ element, geneof interest, and 3′ element, you will perform the MultiSite Gateway™ LRrecombination reaction to simultaneously transfer the three DNAfragments into the pDEST™R4-R3 destination vector to create anattB-containing expression clone with the following structure:

attB4-5′ element-attB1-gene of interest-attB2-3′ element-attB3

To ensure that you obtain the best possible results, we suggest readingthis section and the next section entitled Performing the MultiSiteGateway™ LR Recombination Reaction before beginning.

Experimental Outline

To generate an expression clone, you will:

1. Perform a MultiSite Gateway™ LR recombination reaction using theappropriate entry clones and pDEST™R4-R3 (see below).

2. Transform the reaction mixture into a suitable E. coli host.

3. Select for MultiSite Gateway™ expression clones.

Substrates for the MultiSite Gateway™ LR Recombination Reaction

To perform a three-fragment MultiSite Gateway™ LR recombinationreaction, you must have the substrates listed below.

attL4 and attR1-containing entry clone

attL1 and attL2-containing entry clone

attR2 and attL3-containing entry clone

pDEST™R4-R3 destination vector

Keep in mind the following:

It will be difficult create a three-fragment expression clone using theMultiSite Gateway™ LR recombination reaction if you have any combinationof att-flanked entry clones other than the ones listed above.

The pDEST™R4-R3 destination vector should be used for the three-fragmentMultiSite Gateway™ LR recombination reaction.

Important:

For optimal results, we recommend performing the MultiSite Gateway™ LRrecombination reaction using:

Supercoiled Entry Clones

Supercoiled pDEST™R4-R3 pDEST™R4-R3 Vector

The pDEST™R4-R3 vector is supplied with the kit for use in the MultiSiteGateway™ LR recombination reaction to generate an expression clonecontaining your three DNA fragments of choice. The pDEST™R4-R3 plasmidcontains the following elements:

attR4 and attR3 sites for recombinational cloning of three DNA fragmentsfrom the appropriate Gateway™ entry clones

M13 forward (−20) and M13 reverse primer binding sites to facilitatesequencing of the expression clone, if desired

pUC origin for high-copy replication and maintenance of the plasmid inE. coli

Ampicillin resistance gene for selection of the plasmid in E. coli

Important: Note that all other elements required to express your gene ofinterest in the system of choice must be supplied by the entry clones.

Resuspending the pDEST™R4-R3 Vector

pDES™R4-R3 is supplied as 6 μg of plasmid, lyophilized in TE, pH 8.0. Touse, resuspend the destination plasmid in 100 μl of sterile water to afinal concentration of 60 ng/μl.

Determining How Much DNA to Use in the Reaction

For optimal efficiency, we recommend using the following amounts ofplasmid DNA (i.e., entry clones and destination vector) in a 20 μlMultiSite Gateway™ LR recombination reaction:

An equimolar amount of each plasmid

20-25 fmol of each entry clone and pDEST™R4-R3 is recommended. Do notuse more than 30 fmol of each plasmid.

Note: 20 fmol of pDEST™R4-R3 is approximately 60 ng

Caution:

Do not use more than 120 fmol of total plasmid DNA in a 20 μl MultiSiteGateway™ LR reaction as this will affect the efficiency of the reaction.

Do not exceed more than 1 μg of total DNA (i.e., 250 ng of each entryclone plus destination vector) in a 20 μl MultiSite Gateway™ LR reactionas excess DNA may inhibit the reaction. If you need to use more than 1μg of total DNA, scale up the volume of the MultiSite Gateway™ LRreaction.

Recombination Region of the Expression Clone

The recombination region of the expression clone resulting frompDEST™R4-R3×attL4-5′ entry clone-attR1×attL1-entry clone-attL2×attR2-3′entry clone-attL3 is shown in FIG. 52.

Features of the Recombination Region:

Shaded regions in FIG. 52 correspond to those DNA sequences transferredfrom the three entry clones into the pDEST™R4-R3 vector byrecombination. Note that the sequences comprising the attB1 and attB2sites are entirely supplied by the entry clones. Non-shaded regions arederived from the pDEST™R4-R3 vector.

Bases 31 and 1855 of the pDES™R4-R3 sequence are indicated.

Performing the MultiSite Gateway™ LR Recombination Reaction

Guidelines and instructions are provided in this section to:

Perform a MultiSite Gateway™ LR recombination reaction between suitableentry clones and pDEST™R4-R3 using LR Clonase™ Plus enzyme mix.

Transform the reaction mixture into a suitable E. coli host (see below)

Select for an expression clone

We recommend including a positive control (see below) and a negativecontrol (no entry clones) in your experiment to help you evaluate yourresults.

E. coli Host

We recommend using the One Shots TOP10 Chemically Competent E. colisupplied with the kit for transformation. If you wish to use another E.coli strain, note that any recA, endA E. coli strain is suitable. Do nottransform the LR reaction mixture into E. coli strains that contain theF′ episome (e.g., TOP10F′). These strains contain the ccdA gene and mayprevent negative selection with the ccdB gene.

Note: If you plan to use the One Shot® TOP10 chemically competent cellsfor transformation, see the section of this Example entitled“Transforming One Shot® TOP10 Competent Cells.”

Positive Control

If you used the pMS/GW plasmid as a control for each BP recombinationreaction, you may use the resulting three entry clones as controls in aMultiSite Gateway™ LR recombination reaction with pDEST™R4-R3.

Preparing Purified Plasmid DNA

In many instances you will need to have purified plasmid DNA of eachentry clone to perform the MultiSite Gateway™ LR recombination reaction.You may use any method of choice to isolate purified plasmid DNA. Werecommend using the S.N.A.P.™ MidiPrep Kit available from Invitrogen(Catalog no. K1910-01) or CsCl gradient centrifugation.

Important:

You should use LR Clonase™ Plus enzyme mix to catalyze the MultiSiteGateway™ LR recombination reaction. Note that the LR Clonase™ enzyme mix(Catalog no. 11791-019) used for standard Gateway™ LR recombinationreactions is not optimized for MultiSite Gateway™ LR recombinationreactions.

LR Clonase™ Plus enzyme mix is supplied with the kit, but is alsoavailable separately from Invitrogen (Carlsbad, Calif.).

Materials Needed

You should have the following materials on hand before beginning.

Supplied with the kit:

pDEST™R4-R3 (60 ng/μl in TE, pH 8.0)

LR Clonase™ Plus enzyme mix (Box 3, keep at −80° C. until immediatelybefore use)

5×LR Clonase™ Plus Reaction Buffer (thaw and keep on ice before use)

2 μg/μl Proteinase K solution

Supplied by the User:

Purified plasmid DNA of your attL4 and attR1-flanked entry clone(supercoiled, 20-25 fmol)

Purified plasmid DNA of your attL1 and attL2-flanked entry clone(supercoiled, 20-25 fmol)

Purified plasmid DNA of your attR2 and attL3-flanked entry clone(supercoiled, 20-25 fmol)

Important: Remember that you will need to add plasmid DNA from threeentry clones to the MultiSite Gateway™ LR reaction. Make sure that theplasmid DNA for each entry clone is sufficiently concentrated such thatthe total amount of entry clone plasmid DNA added to a 20 μl MultiSiteGateway™ LR reaction does not exceed 11 μl.

TE Buffer, pH 8.0 (10 mM Tris-HCl, pH 8.0, 1 mM EDTA)

Appropriate competent E. coli host (e.g., One Shot® TOP10) and growthmedia for expression

SOC Medium

LB agar plates containing 50-100 μg/ml ampicillin

Setting Up the MultiSite Gateway™ LR Reaction

1. Add the following components to 1.5 ml microcentrifuge tubes at roomtemperature and mix.

Component Sample Negative Control attL4 and attR1 entry clone (20-25fmol) 1-11 μl — attL1 and attL2 entry clone (20-25 fmol) attR2 and attL3entry clone (20-25 fmol) PDEST ™ R4-R3 vector (60 ng/reaction) 1 μl 1 μl5X LR Clonase ™ Plus Reaction Buffer 4 μl 4 μl TE Buffer, pH 8.0 to 16μl to 16 μl

2. Remove the LR Clonase™ Plus enzyme mix from −80° C. and thaw on ice(.about.2 minutes).

3. Vortex the LR Clonase™ Plus enzyme mix briefly twice (2 seconds eachtime).

4. To each sample above, add 4 μl of LR Clonase™ Plus enzyme mix. Mixwell by vortexing briefly twice (2 seconds each time).

Reminder: Return LR Clonase™ Plus enzyme mix to −80° C. immediatelyafter use.

5. Incubate reactions at 25° C. for 16 hours or overnight.

6. Add 2 μl of the Proteinase K solution to each reaction. Incubate for10 minutes at 37° C.

7. Proceed to transform a suitable E. coli host and select forexpression clones.

Note: You may store the MultiSite Gateway™ LR reaction at −20° C. for upto 1 week before transformation, if desired.

What You Should See

If you use E. coli cells with a transformation efficiency of 1×10⁹cfu/mg, the MultiSite Gateway™ LR reaction should give approximately2,000 to 8,000 colonies if the entire transformation is plated.

Once you have obtained an expression clone, proceed to express yourrecombinant protein in the appropriate system.

Troubleshooting

MultiSite Gateway™ LR & BP Reactions

The table below lists some potential problems and possible solutionsthat may help you troubleshoot the BP or MultiSite Gateway™ LRrecombination reactions.

Problem Reason Solution Few or no colonies Incorrect antibiotic Checkthe antibiotic obtained from used to select for resistance marker andsample reaction transformants use the correct and the antibiotic toselect transformation for entry clones or control gave expressionclones. colonies Recombination Treat reactions with reactions were notproteinase K before treated with transformation. proteinase K Usedincorrect Use the appropriate att sites for entry clones and thereaction pDEST ™ R4-R3 for the MultiSite Gateway ™ LR reaction. Use thecorrect attB PCR product and donor vector (attP) for the BP reaction.Clonase ™ (Plus) Test another aliquot enzyme mix is of the Clonase ™inactive or didn't use (Plus) enzyme mix. suggested amount of Store theClonase ™ Clonase ™ (Plus) (Plus) at −80° C. enzyme mix Do not freezethaw the Clonase ™ (Plus) enzyme mix more than 10 times. Use therecommended amount of Clonase ™ (Plus) enzyme mix. Used incorrect Usethe Clonase ™ Clonase ™ (Plus) (Plus) enzyme mix enzyme mix for theMultiSite Gateway ™ LR reaction. Do not use the LR Clonase ™ (Plus)enzyme mix. Use the CB Clonase ™ enzyme mix for the BP reaction. Toomuch attB PCR Reduce the amount of product was used in attB PCR productused. a BP reaction Use an equimolar ratio of attB PCR product and donorvector (i.e., ~100 fmol each). Long attB PCR Incubate the BP product orlinear reaction overnight. attB expression clone (≧5 kb) Too much DNAwas Use an equimolar used in a MultiSite amount of each entry Gateway ™LR clone and reaction destination vector. Do not exceed 120 fmoles or 1μg of total DNA in the reaction. MultiSite Gateway ™ Incubate the LRreaction not MultiSite Gateway ™ incubated for LR reaction at sufficienttime 25° C. for 16 hours or overnight. Insufficient amount MultiSiteGateway ™ of E. coli LR reaction: transformed or Transform 2 to 5 μlplated of the reaction; plate 50 μl or 100 μl. BP reaction: Transform 1μl of the reaction; plate 20 μl and 100 μl. MultiSite MultiSiteGateway ™ Use an E. coli strain Gateway ™ LR LR reaction that does notcontain Reaction: High transformed into an the F′ episome for backgroundin the E. coli strain transofrmation absence of the containing the F′(e.g. TOP10, DHα ™5). entry clones episome and the ccdA gene Deletions(full or To maintain the partial) of the ccdB integrity of the gene fromthe vector, propagate in destination vector media containing 50-100μg/ml ampicillin and 15-30 μg/ml chloramphenicol. Prepare plasmid DNAfrom one or more colonies and verify the integrity of the vector beforeuse. Contamination of Test for plasmid solution(s) with contamination byanother plasmid transforming E. coli carrying the same with aliquots ofeach antibiotic resistance, of the separate or by a bacteria solutionsused in the carrying a resistance MultiSite Gateway ™ plasmid LRreaction. Test for bacterial contamination by plating an aliquot of eachsolution directly onto LB plates containing ampicillin. Few or nocolonies Competent cells Store competent cells obtained from the storedincorrectly at −80° C. transformation Transformation If you are usingcontrol performed One Shot ® TOP10 incorrectly E. coli, follow theprotocol. If you are using another E. coli strain, follow themanufacturer's instructions. Insufficient amount Increase the amount ofE. coli plated of E. coli plated. Two distinct types BP reaction: TheObtain a new pDONR ™ of colonies (large pDONR ™ vector vector. andsmall) appear contains deletions or point mutations in the ccdB geneNote: The negative control will give a similar number of colonies Lossof plasmid Incubate selective during culture plates at 30° C. (generallythose instead of 37° C. containing large Confirm whether a genes ortoxic deletion has occurred genes) by analyzing the DNA derived from thecolonies. Use Stb12 ™ E. coli (Invitrogen, Catalog no. 10268-019) tohelp stabilize plasmids containing large genes during propagation Trinh,T., et al., FOCUS 16: 78-80 (1994)

attB PCR Cloning

The table below lists some potential problems and possible solutionsthat may help you troubleshoot the BP recombination reaction when usingan attB PCR product as a substrate. These potential problems are inaddition to those encountered in the general BP reaction.

Problem Reason Solution Low yield of attB attB PCR product not Dilutewith 150 μl of 1X TE, pH 8.0 before PCR product diluted with TE addingthe PEG/MgCl₂ solution. obtained after PEG Centrifugation step tooIncrease the time and speed of the purification short or centrifugationcentrifugation step to 30 minutes and speed too low 15,000 x g. Lost PEGpellet When removing the tube from the microcentrifuge, keep track ofthe orientation of the outer edge of the tube where the pellet islocated. When removing the supernatant from the tube, take care not todisturb the pellet. Few or no colonies attB PCR primers Make sure thateach attB PCR primer obtained from a BP incorrectly designed includesfour 5′ terminal Gs and the 22 reaction with attB or 25 bp attB site asspecified. PCR product and attB PCR primers Use HPLC or PAGE-purifiedoligonucleotides both attB positive contaminated with to generate yourattB PCR product. control and incomplete sequences transformation attBPCR product not Gel purify your attB PCR product to remove control gavepurified sufficiently attB primers and attB primer-dimers. expectednumber of For large PCR products Increase the amount of attB PCRcolonies (≧5 kb), too few attB product to 40-100 fmol per 20 μl PCRmolecules added to reaction. the BP reaction Note: Do not exceed 500 ngDNA per 20 μl reaction. Incubate the BP raection overnight. Insufficientincubation Increase the incubation time of the BP reaction time up to 18hours. Entry clones migrate BP reaction may have Purify attB PCR productusing the as 2.2 kb cloned attB primer- PEG/MgCl₂ purification protocolor supercoiled dimers gel-purify the attB PCR product. plasmids Use aPlatinum ® DNA polymerase with automatic hot-start capability for higherspecificity amplification. Redesign attB PCR primers to minimizepotential mutual priming sites leading to primer-dimers.

Feature Benefit rrnB T1 and T2 transcription terminators Protects thecloned gene from expression by vector-encoded promoters, therebyreducing possible toxicity Orosz, A., et al., Eur. J. Biochem. 201:653-659 (1991) M13 forward (−20) priming site Allows sequencing in thesense orientation. attP4 and attP1R site (pDONR ™ P4-P1R) Bacteriophageλ-derived DNA recombination attP1 and attP2 sites (pDONR ™ 221)sequences that have been optimized to permit attP2R and attP3 sites(pDONR ™ P2R-P3) recombinational cloning of DNA fragments from specificattB PCR products Landy, A., Annu. Rev. Biochem. 58: 913-949 (1989).ccdB gene Permits negative selection of the plasmid. Chloramphenicolresistance gene (Cm^(R)) Allows counterselection of the plasmid. M13reverse priming site Permits sequencing in the anti-sense orientation.Kanamycin resistance gene Allows selection of the plasmid in E. coli.pUC origin and replisome assembly site Permits high-copy replication andmaintenance of the plasmid in E. coli.

pDEST™R4-R3 (4248 bp) contains the following elements. All features havebeen functionally tested.

Feature Benefit M13 forward (−20) Allows sequencing in the senseorientation. priming site attR4 and attR3 sites Bacteriophage λ-derivedDNA recombination sequences that have been optimized to permitrecombinational cloning of DNA fragments from specific attL-flankedentry clones (Landy, 1989). Chloramphenicol Allows counterselection ofthe plasmid. resistance gene (Cm^(R)) ccdB gene Permits negativeselection of the plasmid. M13 reverse priming Permits sequencing in theanti-sense site orientation. pUC origin and Permits high-copyreplication and maintenance replisome assembly site of the plasmid in E.coli. Ampicillin resistance Allow selection of the plasmid in E. coli.gene (β-lactamase) bla promoter Permits expression of the ampicillinresistance gene.

Description

pMS/GW is a 5898 bp control vector, and was generated using theMultiSite Gateway™ LR recombination reaction between pDEST™R4-R3 andthree entry clones containing the araC gene and araBAD promoter, gusgene, and lacZa fragment, respectively. This expression clone isdesigned for use as a control for each BP recombination reaction.

Glossary of Terms Use in this Example

attL, attR, attB, and attP

The recombination sites from bacteriophage lambda that are utilized inthe Gateway™ Technology.

attL always recombines with attR in a reaction mediated by the LRClonase™ enzyme mix (for standard Gateway™ reactions) or LR Clonase™Plus enzyme mix (for MultiSite Gateway™ reactions). The LR reaction isthe basis for the entry clone(s) x destination vector reaction.Recombination between attL and attR sites yields attB and attP sites onthe resulting plasmids.

attB sites always recombine with attP sites in a reaction mediated bythe BP Clonase™ enzyme mix. The BP reaction is the basis for thereaction between the donor vector (pDONR™) and PCR products or otherclones containing attB sites. Recombination between attB and attP sitesyields attL and attR sites on the resulting plasmids.

BP Clonase™ Enzyme Mix

A proprietary mix (available from Invitrogen Corporation; Carlsbad,Calif. catalog nos. 11789-013 and 11789-021) of lambda recombinationproteins that mediates the attB×attP recombination reaction.

ccdB Gene

A gene which encodes a protein that interferes with E. coli DNA gyrase,thereby inhibiting the growth of standard E. coli hosts. This gene ispresent on Gateway™ destination, donor, and supercoiled entry vectors.When recombination occurs between a destination vector and an entryclone, the ccdB gene is replaced by the gene of interest. Cells thattake up unreacted vectors carrying the ccdB gene, or by-productmolecules that retain the ccdB gene, will fail to grow. This allowshigh-efficiency recovery of only the desired clones.

DB3.1™ Competent Cells

These cells (available from Invitrogen Corporation, Carlsbad, Calif.)are resistant to the effects of the ccdB gene product and are used topropagate vectors that contain the ccdB gene (e.g., donor, supercoiledentry, and destination vectors).

Destination Vector

Gateway™-adapted expression vectors which contain attR sites and allowrecombination with entry clones.

Donor Vector (pDONR™)

A Gateway™ vector containing attP sites. This vector is used for cloningPCR products and DNA sequences of interest flanked by attB sites(expression clones) to generate entry clones. When PCR fragmentsmodified with attB sites are recombined with the pDONR™ vector in a BPreaction, they yield an entry clone:

PCR fragment (attB sites)+pDONR™ vector (attP sites)+entry clone

Entry Clone

The result of cloning a DNA segment into an entry vector or donorvector. For MultiSite Gateway™ applications, the entry clone containsthe DNA sequence of interest flanked by attL sites or a combination ofattL and attR sites. The entry clone can be used for subsequenttransfers into destination vectors.

Entry Vector (pENTR™)

A Gateway™ vector containing attL1 and attL2 sites used for cloning DNAfragments using either TOPO® Cloning or conventional restriction enzymesand ligase.

Expression Clone

The result of subcloning the DNA of interest from an entry clone into adestination vector of choice by LR recombination. For MultiSite Gateway™applications, the expression clone contains DNA fragments transferredfrom multiple entry clones into a single destination vector. Each DNAfragment of interest in the expression clone is flanked by attB sites:

Entry Clone(s)+Destination Vectorexpression Clone

Gateway™ Technology

A universal cloning technology (available from Invitrogen Corporation;Carlsbad, Calif.) based on the site-specific recombination properties ofbacteriophage lambda to allow highly efficient movement of a DNAsequence of interest into multiple vector systems. See U.S. Pat. Nos.5,888,732; 6,143,557; 6,171,861; 6,270,969; and 6,277,608, thedisclosures of all of which are incorporated herein by reference intheir entireties.

LR Clonase™ Plus Enzyme Mix

A proprietary mix (Available from Invitrogen Corporation; Carlsbad,Calif., catalog no. 12538-013) of lambda and E. coli recombinationproteins that mediates the attL x attR recombination reaction. Thisenzyme mix has been optimized for demanding applications includingMultiSite Gateway™, but is also suitable for use in standard Gateway™applications.

REFERENCES

-   Bernard, P., and Couturier, M., J. Mol. Biol. 226:735-745 (1992)-   Bernard, P., et al., J. Mol. Biol. 234:534-541 (1993)-   Kozak, M., Nucleic Acids Res. 15:8125-8148 (1987)-   Kozak, M., J. Cell Biology 115:887-903 (1991)-   Kozak, M., Proc. Natl. Acad. Sci. USA 87:8301-8305 (1990)-   Landy, A., Annu. Rev. Biochem. 58:913-949 (1989)-   Miki, T., et al., J. Mol. Biol. 225:39-52 (1992)-   Orosz, A., et al., Eur. J. Biochem. 201:653-659 (1991)-   Ptashne, M., A Genetic Switch: Phage (Lambda) and Higher Organisms,    Cell Press, Cambridge, Mass. (1992)-   Shine, J., and Dalgarno, L., Eur. J. Biochem. 57:221-230 (1975)-   Trinh, T., et al., FOCUS 16:78-80 (1994)

The invention illustratively described herein suitably may be practicedin the absence of any element or elements, limitation or limitationswhich is not specifically disclosed herein. Thus, for example, in eachinstance herein any of the terms “comprising,” “consisting essentiallyof,” and “consisting of” may be replaced with either of the other twoterms. The terms and expressions that have been employed are used asterms of description and not of limitation, and there is no intentionthat in the use of such terms and expressions of excluding anyequivalents of the features shown and described or portions thereof, butit is recognized that various modifications are possible within thescope of the invention claimed. Thus, it should be understood thatalthough the present invention has been specifically disclosed herein,optional features, modification and variation of the concepts hereindisclosed may be resorted to by those skilled in the art, and that suchmodifications and variations are considered to be within the scope ofthis invention as defined by the appended claims. In addition, wherefeatures or aspects of the invention are described in terms of Markushgroups, those skilled in the art will recognize that the invention isalso thereby described in terms of any individual member or subgroup ofmembers of the Markush group.

The invention has been described broadly and generically herein. Each ofthe narrower species and subgeneric groupings falling within the genericdisclosure also form part of the invention. This includes the genericdescription of the invention with a proviso or negative limitationremoving any subject matter from the genus, regardless of whether or notthe excised material is specifically recited herein. Other aspects ofthe invention are within the following claims.

All publications, patents and patent applications mentioned in thisspecification are indicative of the level of skill of those skilled inthe art to which this invention pertains, and are herein incorporated byreference to the same extent as if each individual publication, patentor patent application was specifically and individually indicated to beincorporated by reference.

1-13. (canceled)
 14. A method of producing a population of hybrid nucleic acid molecules comprising: (a) mixing at least a first population of nucleic acid molecules comprising one or more recombination sites with at least one target nucleic acid molecule comprising one or more recombination sites; and (b) causing some or all of the nucleic acid molecules of the at least first population to recombine with all or some of the target nucleic acid molecules, thereby forming the population of hybrid nucleic acid molecules.
 15. The method of claim 14, wherein the recombination is caused by mixing the first population of nucleic acid molecules and the target nucleic acid molecule with one or more recombination proteins under conditions which favor the recombination.
 16. The method of claim 15, wherein the recombination proteins comprise one or more proteins selected from the group consisting of: (a) Cre; (b) Int; (c) IHF; (d) Xis; (e) F is; (f) Hin; (g) Gin; (h) Cin; (i) Tn3 resolvase; (j) TndX; (k) XerC; and (1) XerD.
 17. The method of claim 14, further comprising mixing the first population of nucleic acid molecules and the target nucleic acid molecule with at least a second population of nucleic acid molecules comprising one or more recombination sites.
 18. The method of claim 14, wherein the recombination sites comprise one or more recombination sites selected from the group consisting of: (a) lox sites; (b) psi sites; (c) dif sites; (d) cer sites; (e) frt sites; (f) att sites; and (g) mutants, variants, and derivatives of the recombination sites of (a), (b), (c), (d), (e), or (f) which retain the ability to undergo recombination.
 19. The method of claim 14, further comprising selecting for the population of hybrid nucleic acid molecules.
 20. The method of claim 14, further comprising selecting for the population of hybrid nucleic acid molecules and against the first population of nucleic acid molecules and against the target nucleic acid molecules.
 21. The method of claim 20, further comprising selecting against cointegrate molecules and byproduct molecules.
 22. A method for targeting or mutating a target gene or nucleotide sequence comprising: (a) obtaining at least one first nucleic acid molecule comprising one or more recombination sites and one or more selectable markers, wherein the first nucleic acid molecule comprises one or more nucleotide sequences homologous to the target gene or nucleotide sequence; and (b) contacting the first nucleic acid molecule with one or more target genes or nucleotide sequences under conditions sufficient to cause homologous recombination at one or more sites between the target gene or nucleotide sequence and the first nucleic acid molecule, thereby causing insertion of all or a portion of the first nucleic acid molecule within the target gene or nucleotide sequence.
 23. The method of claim 22, wherein the target gene or nucleotide sequence is inactivated.
 24. The method of claim 22, further comprising selecting for a host cell containing the target gene or nucleotide sequence.
 25. A method of joining n nucleic acid segments, wherein n is an integer greater than 2, comprising: (a) providing a 1^(st) through an n^(th) nucleic acid segment, each segment flanked by two recombination sites, wherein the recombination sites are selected such that one of the two recombination sites flanking the i^(th) segment, n_(i), reacts with one of the recombination sites flanking the n_(i−1) ^(th) segment and the other recombination site flanking the i^(th) segment reacts with one of the recombination sites flanking the n_(i+1) ^(th) segment; and (b) contacting the segments with one or more recombination proteins under conditions causing the segments to join.
 26. The method of claim 25, wherein the recombination proteins comprise one or more proteins selected from the group consisting of: (a) Cre; (b) Int; (c) IHF; (d) X is; (e) F is; (f) Hin; (g) Gin; (h) Cin; (i) Tn3 resolvase; (j) TndX; (k) XerC; and (1) XerD.
 27. The method of claim 25, wherein the recombination sites which recombine with each other comprise att sites having identical seven base pair overlap regions.
 28. The method of claim 25, further comprising inserting the nucleic acid segments joined in step (b) into a vector.
 29. The method of claim 25, wherein the joined nucleic acid segments undergo intramolecular recombination to form a circular molecule.
 30. The method of claim 25, wherein one or more of the nucleic acid segments encodes a selectable marker.
 31. The method of claim 25, wherein one or more of the nucleic acid segments contains an origin of replication.
 32. A kit for joining, deleting, or replacing nucleic acid segments, the kit comprising (1) one or more recombination proteins or a composition comprising one or more recombination proteins, (2) at least one nucleic acid molecule comprising one or more recombination sites having at least two different recombination specificities, and (3) one or more components selected from the group consisting of: (a) nucleic acid molecules comprising additional recombination sites; (b) one or more enzymes having ligase activity; (c) one or more enzymes having polymerase activity; (d) one or more enzymes having reverse transcriptase activity; (e) one or more enzymes having restriction endonuclease activity; (f) one or more primers; (g) one or more nucleic acid libraries; (h) one or more supports; (i) one or more buffers; (j) one or more detergents or solutions containing detergents; (k) one or more nucleotides; (l) one or more terminating agents; (m) one or more transfection reagents; (n) one or more host cells; and (o) instructions for using the kit components.
 33. The kit of claim 32, wherein the recombination sites having at least three different recombination specificities each comprising att sites with different seven base pair overlap regions. 